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Preface 


Modeling  and  simulation-based  (MODSIM)  engineering  and  science  is  rapidly  becoming  an 
essential  scientific  methodology  for  nearly  all  areas  of  engineering  and  many  branches  of  science 
and  for  research,  development,  concept  generation,  product  design  and  manufacturing,  and 
consumer  marketing.  Continuing  advances  in  computational  science  and  networking 
technologies  have  made  MODSIM-based  engineering  and  science  a “powerful  and  ubiquitous 
tool”  for  engineers  and  scientists  and  have  made  it  possible  to  extend  the  range,  depth,  and 
applications  of  MODSIM  vastly,  especially  when  the  phenomena  being  investigated  are  not 
observable  or  measurements  are  impractical  or  too  expensive.  According  to  a National  Science 
Foundation  Blue-Ribbon  Panel,  MODSIM-based  engineering  and  science  (1)  is  an  equal  and 
indispensable  partner,  along  with  theory  and  experiment,  in  the  quest  for  enhanced  technological 
innovation;  (2)  holds  great  promise  for  the  pervasive  advancement  of  knowledge  and 
understanding  through  discovery;  (3)  is  indispensable  to  the  nation's  continued  leadership  in 
innovation  and  economic  global  competitiveness;  and  (4)  is  “key”  to  advances  in  a variety  of 
fields  - biomedicine,  manufacturing,  systems  engineering,  nanotechnology,  health  care, 
atmospheric  and  climate  science,  energy  and  environmental  sciences,  advanced  materials,  and 
product  development.  MODSIM-based  engineering  and  science  is  also  essential  to  the  success  of 
NASA’s  research,  missions,  and  projects. 

The  MODSIM  World  Conference  and  Expo  began  in  2007  when  the  Hampton  Roads 
Partnership’s,  Center  for  Public/Private  Partnership  (CPS)  saw  the  need  to  share  information 
about  and  interests  in  the  vast  amount  of  MODSIM-based  research  and  development  occurring  in 
the  Hampton  Roads  region  of  Virginia.  Because  of  the  synergy  created  by  the  efforts  of  Joint 
Forces  Command;  Virginia  Modeling,  Analysis  and  Simulation  Center  (VMASC);  Eastern 
Virginia  Medical  School  (EVMS),  the  NASA  Langley  Research  Center  (LaRC),  etc.,  it  became 
obvious  to  the  CPS  membership  that  there  was  a need  to  establish  an  “interdisciplinary”  forum 
for  sharing  of  MODSIM  knowledge  and  achievement.  Their  efforts  created  MODSIM  World 
Conference  & Expo.  The  MODSIM  Conference  & Expo  is  now  in  its  fourth  cycle. 

MODSIM  World  2010  was  held  in  Hampton,  Virginia,  October  13-15,  2010.  The  theme  of  the 
2010  conference  & expo  was  “2C*  Century  Decision-Making:  The  Art  of  Modeling& 
Simulation”.  The  conference  program  consisted  of  seven  technical  tracks  - Defense,  Engineering 
and  Science,  Health  & Medicine,  Homeland  Security  & First  Responders,  The  Human 
Dimension,  K-20  STEM  Education,  and  Serious  Games  & Virtual  Worlds.  Selected  papers  and 
presentations  from  MODSIM  World  2010  Conference  & Expo  are  contained  in  this  NASA 
Conference  Publication  (CP).  Section  8.0  of  this  CP  contains  papers  from  MODSIM  World  2009 
Conference  & Expo  that  were  unavailable  at  the  time  of  publication  of  NASA/CP-2010-216205 
Selected  Papers  Presented  at  MODSIM  World  2009  Conference  and  Expo,  March  2010. 

As  a condition  for  inclusion  in  the  conference  proceedings,  the  first  author  was  responsible  for 
securing/obtaining  all  permissions  associated  with  the  general  release  and  public  availability  of 
the  paper/presentation.  Further,  the  first  authors  also  had  to  grant  NASA  the  right  to  include  their 
work  in  the  NASA  CP.  There  are  62  papers  and  41  presentations  in  this  NASA  CP.  There  are 
two  appendices  in  this  publication.  Appendix  A contains  the  names  and  affiliations  of  the 
conference  organizers.  Appendix  B includes  a description  of  the  technical  tracks  and  the  names 


of  the  individuals  who  organized  each  track.  Preparing  the  proceedings  of  this  conference 
required  the  collaborative  efforts  of  many  individuals. 

The  preparation  of  NASA/CP-201 1-217069  would  not  have  been  possible  without  the  expertise 
and  contributions  of  Ms.  Leanna  “Dee”  Bullock  (ATK  Space  Systems,  INC.)  and  Ms.  Jennifer 
McNamara  (BreakAway,  LTD.). 
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Examining  Passenger  Flow  Choke  Points  at  Airports  Using  Discrete 
Event  Simulation 


Examining  Passenger  Flow  Choke  Points  at  Airports  Using 
Discrete  Event  Simulation 

Jeremy  R.  Brown  & Poornima  Madhavan 
Old  Dominion  University 
irbrown@odu.edu  pmadhava@odu.edu 

Abstract.  The  movement  of  passengers  through  an  airport  quickly,  safely,  and  efficiently  is  the  main  function  of  the  various  checkpoints  (check- 
in, security,  etc)  found  in  airports.  Human  error  combined  with  other  breakdowns  in  the  complex  system  of  the  airport  can  disrupt  passenger 
flow  through  the  airport  leading  to  lengthy  waiting  times,  missing  luggage  and  missed  flights.  In  this  paper  we  present  a model  of  passenger 
flow  through  an  airport  using  discrete  event  simulation  that  will  provide  a closer  look  into  the  possible  reasons  for  breakdowns  and  their 
implications  for  passenger  flow.  The  simulation  is  based  on  data  collected  at  Norfolk  International  Airport  (ORF).  The  primary  goal  of  this 
simulation  is  to  present  ways  to  optimize  the  work  force  to  keep  passenger  flow  smooth  even  during  peak  travel  times  and  for  emergency 
preparedness  at  ORF  in  case  of  adverse  events.  In  this  simulation  we  ran  three  different  scenarios;  real  world,  increased  check-in  stations,  and 
multiple  waiting  lines.  Increased  check-in  stations  increased  waiting  time  and  instantaneous  utilization,  while  the  multiple  waiting  lines 
decreased  both  the  waiting  time  and  instantaneous  utilization.  This  simulation  was  able  to  show  how  different  changes  affected  the  passenger 
flow  through  the  airport. 


1.0  INTRODUCTION 

At  the  turn  of  the  millennium,  erroneous 
information  typed  into  a central  database  at  Hong 
Kong's  $20  billion  Chek  Lap  Kok  airport  triggered  a 
domino  effect  that  sent  the  new  facility  into  almost 
comic  confusion:  flights  taking  off  without  luggage, 
airport  officials  tracking  flights  with  plastic  pieces 
on  a magnetic  board,  and  airlines  calling  confused 
ground  staff  on  cellular  phones  to  say  where  even 
more  confused  passengers  could  find  their  planes. 
Similar  scenes  were  played  out  at  Malaysia’s  $2.2 
billion  Kuala  Lumpur  International  Airport,  where 
stranded  cargo  translated  quickly  in  the  tropical 
heat  into  rotting  refuse.  Such  examples  drive  home 
one  of  the  oldest  rules  of  computer  programming, 
the  simple  postulate  that  a machine  is  only  as 
good  as  the  humans  using  it. 

Clearly  airports  are  very  complex  environments  in 
which  passengers  are  the  consumers  and 
efficiency  is  the  key  to  organizing  the  complexity. 
Airports  can  be  thought  of  as  systems  with  many 
parts  that  need  to  work  together  in  order  to 
accomplish  a task.  This  task  is  to  get  passengers 
through  the  airport  and  onto  waiting  airplanes. 

This  system  can  break  down  when  problems 
occur.  Therefore,  modeling  of  processes  to 
optimize  traffic  flow  is  where  emergency  planning 
can  come  into  play.  Some  of  the  integral 
components  of  an  airport  are  infrastructure 
features  such  as  buildings,  passenger  ground 
transport  systems,  runways,  taxiways,  and 
vehicles  (needed  for  getting  baggage,  fuel,  and 
food  onto  the  planes). 


Additional  features  of  the  system  are  the  computer 
systems  such  as  baggage  check  computers  and  x- 
ray  baggage  machines.  The  final  link  in  the  airport 
system  is  the  human  component,  i.e.,  workers  that 
operate  the  machinery  and  computers. 

1.1  Role  of  the  runway 

The  runway  plays  an  important  part  in  regulating 
traffic  flow  by  allowing  airaaft  to  land  and  take  off 
safely.  Taxiways  serve  the  same  purpose, 
although  they  are  primarily  used  to  get  the  planes 
from  the  runway  to  the  terminal.  The  bigger  the 
aircraft,  the  longer  and  tougher  the  runway  and 
taxiways  must  be  to  handle  the  weight. 

1.2  Role  of  computer  systems 

The  computer  systems  in  an  airport  are  important 
to  the  flow  of  traffic  in  that  they  help  keep  track  of 
all  the  flights  coming  and  going,  as  well  as  the  flow 
of  passengers  and  their  baggage.  In  addition, 
computer  systems  play  an  important  role  in  airport 
security  by  screening  luggage,  and  profiling 
passengers  using  video  cameras. 

1.3  The  human  component 

The  presence  of  humans  is  integral  to  the  running 
of  all  the  above  components.  The  workers  that 
operate  the  systems  are  an  important  factor  to 
take  into  account  when  looking  at  the  airport  as  a 
system  of  systems.  It  is  the  humans  that  make  the 
decisions,  and  keep  the  other  systems  working.  A 
significant  proportion  of  errors  in  these  systems, 
therefore,  are  due  to  incidences  of  human  error. 
This  raises  the  importance  of  modeling  human 
behavior  to  better  understand  the  behavioral 
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implications  on  traffic  flow  in  a large  system  of 
systems  such  as  an  airport. 

A few  attempts  have  been  made  so  far  to  quantify 
and  model  passenger  flow  in  various  contexts 
ranging  from  train  station  platforms  to  elevators  of 
tall  buildings  [1,2].  Nahke  created  a simulation  of 
Hartsfield  Atlanta  International  Airport’s  passenger 
movement  system  which  consisted  of  nine  trains 
moving  passengers  from  terminal  to  terminal  [3j. 
Through  the  use  of  this  simulation,  Nahke  were 
able  to  see  what  effects  increasing  the  number  of 
trains  had  to  try  and  increase  passenger  capacity. 
They  were  able  to  show  that  through  small 
changes  the  train  system  that  was  designed  for  a 
maximum  of  nine  trains  could  easily  handle  ten 
trains,  increasing  passenger  capacity  [3]. 

Ke,  Zizheng,  and  Liling  used  simulation  to  optimize 
bus  schedules  during  peak  times  [4].  Wusheng 
and  Qian  created  a simulation  using  queuing 
theory  to  examine  passenger  flow  at  the  curbside 
of  an  airport  [5].  Another  study  examined  the  flow 
of  traffic  in  an  airport  through  simulation  and 
modeling  in  a similar  way  to  what  is  being 
proposed  [6].  Although  this  study  effectively 
examined  the  problem  of  passenger  flow  from  the 
standpoint  of  scheduling,  the  model  ignored  the 
degree  of  heterogeneity  among  the  passengers 
themselves  that  are  largely  accountable  for  several 
system  bottlenecks.  Specifically,  the  model  did  not 
take  into  account  passenger  behaviors  that  would 
be  related  to  their  degree  of  flight  experience, 
physical  abilities,  presence  of  children,  etc,  which 
would  certainly  impact  the  overall  rate  of 
passenger  flow  through  an  airport.  Furthermore, 
the  earlier  model  is  dated  and  does  not  include 
data  on  baggage  screening  procedures,  which  are 
an  integral  component  of  airport  security  in  the 
present  day. 

Simulation  can  also  be  used  in  the  design  process. 
It  can  be  used  to  look  at  how  people  will  move 
through  a building,  or  to  see  how  a change  affects 
the  rest  of  the  system  being  designed.  Brown  and 
Garcia  [7]  used  Simulink,  in  Matlab,  to  help  design 
a control  system  for  unmanned  aerial  vehicle 
helicopters.  This  allowed  them  to  try  different 
control  systems  without  incurring  the  cost  of 
building  them  and  testing  them  in  the  real  world 

The  goal  of  our  current  research  therefore  is  to 
develop  a working  model  of  an  airport  using 
discrete  event  simulation  with  particular  emphasis 


on  homeland  security.  The  simulation  can  be  used 
for  homeland  security  purposes  to  understand 
better  where  workers  are  needed  to  provide 
optimal  security  for  waiting  passengers. 

Through  this  model,  we  represent  traffic  flow 
through  an  airport  as  a chronological  set  of  events 
that  is  tied  in  to  passenger  behavior.  Each  event 
(e  g.,  arriving  at  check-in,  carry-on  baggage  check, 
and  final  ticket  check)  occurs  as  an  instant  in  time 
and  marks  a change  of  state  in  the  system.  The 
simulation  was  designed  using  ARENA  Discrete 
Event  Simulation  software  as  described  in  the  next 
section.  Discrete  Event  Simulation  (DES)  software 
was  created  to  simulate  real  world  events  that 
have  random  components  to  them  and  that  are  not 
time  driven.  How  the  simulation  moves  forward  is 
based  on  arrival  and  service  times  drawn  from  a 
random  number  generator,  which  can  be  given 
functions  from  which  to  draw  these  numbers. 

These  random  times  tell  when  an  entity  will  arrive, 
and  how  long  it  takes  to  process  the  entity.  The 
reason  to  use  DES  for  the  airport  simulation  is  due 
to  its  simplicity  in  creating,  the  ability  to  recreate 
the  random  arrival  and  service  times,  and  that  the 
arrival  of  passengers  and  the  time  it  takes  to 
process  them  is  not  moved  forward  by  the  time 
moving  forward. 

2.0  THE  SIMULATION 

2.1  Materials 

Laptop  computer  with  Windows  XP  running 
ARENA  DES  Software  Version  10.0  build  30. 

2.2  Simulation  Components 

The  simulation  can  be  broken  into  multiple 
components  each  of  which  is  combined  in  different 
places  of  the  simulation  to  create  the  integrated 
airport  simulation. 

2.2.1  Creation  module 

This  module  is  used  to  populate  the  simulation 
with  entities,  which  in  this  simulation  are 
passengers.  The  creation  module  determines  how 
many  passengers  are  going  to  arrive  at  the  airport, 
and  how  often  they  arrive.  With  having  a creation 
module,  at  the  end  of  the  simulation  a delete 
module  must  be  used  to  remove  the  passengers 
and  have  them  leave  the  simulation. 
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2.2  Assign  module 

This  module  allows  specific  attributes  to  be 
assigned  to  the  passengers,  such  as  a function  to 
predict  how  long  it  should  take  the  passenger  to 
get  through  the  baggage  check-in. 

2.2.3  Decision  module 

The  decision  module  is  used  to  route  passengers 
through  a choice.  For  example  one  decision 
module  routes  the  passenger  to  either  the 
automated  self  check-in,  or  the  manned  check-in 
counter  based  on  random  chance,  based  on  a 
percentage  of  passengers  or  even  a formula. 

2.2.4  Process  module: 

This  module  is  used  to  carry  out  a specific 
process,  such  as  the  check-in  process  or  the 
luggage  screening  process.  Each  process  has 
specific  resources  that  are  assigned  to  it,  such  as 
the  security  screeners,  baggage  handlers  and 
check-in  agents. 

2.3  Data  Collection 

Data  for  the  simulation  was  collected  from  the 
Norfolk  International  Airport  with  consent  from  the 
different  airlines  and  also  the  Transportation 
Security  Administration  for  the  airport.  Data  was 
collected  between  7:00  am  and  3:00  pm  Monday 
through  Thursday  for  two  weeks.  Arrival  times 
were  collected  by  using  a stop  watch  and 
measuring  the  time  between  each  passenger 
crossing  a particular  point  when  arriving  into  the 
airport  building.  These  times  were  then  recorded 
for  later  use  in  the  simulation. 

The  processing  times  for  the  check-in  were 
measured  by  observing  passengers  checking  in. 
When  the  passenger  started  talking  with  the  ticket 
counter  agent  or  when  they  first  touched  the 
computer  screen  the  stop  watch  was  started. 

When  the  passenger  gathered  their  luggage  and 
moved  away  was  when  the  time  would  stop.  This 
data  was  recorded  for  later  use  in  the  simulation. 

Processing  times  for  the  carry-on  luggage 
screening  were  collected  by  observing  passengers 
going  through  the  security  checkpoint,  The  stop 
watch  was  started  once  passengers  put  their 
luggage  on  the  conveyor  belt  and  stepped  away  to 
go  through  the  metal  detector.  The  time  was 
stopped  once  they  picked  up  their  luggage. 


These  different  times  were  put  into  the  input 
analyzer  of  Arena  DES  so  that  an  equation  could 
be  fit  to  the  data  and  then  put  into  the  simulation. 
See  table  1 for  the  airport  data. 

2.4  The  Airport  Simulation 

Since  this  simulation  deals  primarily  with 
passenger  flow  through  an  airport,  the  only  parts  of 
the  airport  that  were  simulated  were  those  that 
directly  affect  the  passengers  themselves  as  they 
enter  and  travel  through  the  airport,  and  finally 
board  their  plane. 

Three  main  areas  that  were  used  in  the  simulation: 
(i)  the  initial  check-in. 


(ii)  the  carry-on  luggage  screening 

Input  Equations 

Arrival  times 

-0.001  + GAMM(0, 0) 

Manned  check-in  Times 

54  + EXPO(O) 

Self-check-in  times 

60  + WEIB(0,  0) 

Security  check  point 

10  + WEIB(0,  0) 

See  Figure  1 for  a diagram  of  the  ARENA 
simulation.  These  two  points  were  chosen 
because  they  are  the  points  where  passenger  flow 
is  controlled  by  airport  authorities,  yet  have  the 
most  impact  on  passenger  behavior.  The  time 
when  the  passenger  arrives  at  the  airport  cannot 
be  controlled,  and  is  therefore  a random  variable 
within  the  simulation,  and  is  treated  as  such. 

The  arrival  times  of  passengers  are  randomized 
based  on  data  collected  at  the  Norfolk  International 
Airport.  The  passengers  were  categorized  based 
on  the  main  air  carriers  operating  at  the  Norfolk 
Airport: 

• American/Continental  Airlines 

• Southwest  Airlines 

• USAirways 

American  and  Continental  Airlines  were  grouped 
together  due  to  the  extremely  low  passenger  rate 
observed  at  the  airport.  Each  passenger  category 
was  assigned  a different  process  time  based  on 
times  collected  from  each  processing  area.  The 
check-in  area  {see  Figure  1)  is  divided  into  self 
check-in  and  manned  check-in,  for  each  airline. 

For  the  self  check-in,  the  primary  resource  is  the 
automated  check-in  machine.  For  the  manned 
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station  the  primary  resource  is  personnel  manually 
checking  the  passengers  and  their  luggage. 

The  next  area  the  passengers  went  through  was 
the  luggage  screening  security  checkpoint  (see 
Figure  1).  Each  passenger  goes  through  this 
section,  just  as  they  do  in  the  real  world.  Random 
stops  were  able  to  be  initiated  in  the  simulation  at 
this  point.  For  example,  the  number  of  passengers 
stopped  could  be  set  as  a predetermined 
percentage,  and  that  many  passengers  will  be 
stopped.  Alternatively,  certain  passengers  can  be 
assigned  a particular  attribute  tag  such  as  race, 
gender  or  physical  ability;  then  those  passengers 
would  be  stopped  more  often  in  the  simulation 
than  other  types  of  passengers.  The  simulation 
was  run  for  80  iterations,  one  iteration  being  a 24 
hour  a day. 


3.0  SIMULATION  RESULTS 

After  running  the  simulation,  the  average  number 
of  entities  that  entered  the  simulation  was  179.14 
for  American/Continental,  1068.43  for  Southwest, 
and  957.09  for  USAirways.  See  table  2 for  the 
range  and  average  wait  times. 

The  wait  time  for  USAirways  in  the  simulation 
indicated  a significant  difference  between  the 
manned  check-in  and  self-check-in  (t{78)  = 4.33,  p 
< .001),  with  the  manned  check  in  having  a lower 
wait  time  (M  = 2.64,  SD  = 1.23)  than  the  self- 
check-in  (M  = 12.07,  SD  = 3.98).  The  manned 
check-in  and  automated  check-in  for 
American/Continental  and  Southwest  airlines  were 
not  statistically  different  (t(78)  = 0.16,  p = ns;  t(78) 
= 0.07,  p = ns).  In  the  simulation.  Southwest’s 
manned  and  self-check-in  {t(78)  = 2.11,  p < .05; 
t{78)  = 2.72,  p < .01)  and  USAirways’s  self-check- 
in (f(78)  = 5.10,  p < .001)  had  significantly  longer 
wait  times  than  did  the  security  checkpoints. 


Fig.  1.  ARENA  diagram  of  Airport  Simulation.  The  red  area  represents  the  check-in  area.  The  green  area 
represents  the  carry-on  luggage  check  points. 

Table  2 
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Wait  Time 


AirIfne/CheckpQint 

Average 

Minimum 

Max 

American_Contin  ental 

manned 

2.51 

0.00 

33.28 

self  check-in 

2J1 

0.00 

56.57 

Southwest 

manned 

6.74* 

0.00 

67.72 

self  check-in 

6.60* 

0.00 

75.26 

USAir 

manned 

2.64 

0.00 

33.92 

self  check-in 

12  07*** 

0.00 

105.48 

Security  Check  point  1 

1.46 

0.00 

17.71 

Security  Check  point  2 

1.45 

0.00 

1SJ8 

Note:  time  in  minutes; 
*p<.05,  ***p<.001 


Instantaneous  utilization  is  another  way  to 
look  at  how  resources  are  being  used  within 
the  simulation.  See  table  3.  Instantaneous 
utilization  shows  the  percentage  of  time  that 
the  resource  was  used.  The  higher  the 
percentage,  the  more  the  resource  was  used. 


Table  3. 

Instantaneous  Utilization 


Type  of  Service 

Average 

Norfolk  Airport 

Manned  Check-in 

78.40% 

Self  Check-in 

83.38% 

Security  Checkpoint 

82.43% 

4.0  OPTIMIZING  PASSENGER  FLOW 

As  the  results  indicate,  Southwest  and  USAir 
have  the  longest  waiting  times  for  both 
manned  check  in  as  well  as  self  check-in.  To 
decrease  the  wait  times,  one  would  assume 
that  increasing  the  number  of  stations  would 
decrease  this  wait  time.  As  table  4 shows,  the 
wait  times  actually  increase  significantly  with 
increased  stations,  in  this  case  2 additional 
workers  and  10  additional  self  check-in 
stations. 

Another  way  to  optimize  the  Southwest  and 
USAir  wait  times  is  to  divide  the  check-in 


stations  into  multiple  lines,  for  this  case  two. 
This  allows  people  to  choose  a line  that  is 
shorter  decreasing  their  wait  time.  As  Table  5 
shows,  dividing  the  check-in  lines  to  two  lines 
for  Southwest  and  USAir,  wait  times  were  cut 
by  over  half. 

These  wait  times,  however,  do  not  tell  the 
whole  story.  To  get  the  full  story,  we  also 
need  to  examine  the  resource  utilization  for 
these  two  changes.  See  table  6.  By 
increasing  the  number  of  check-in  stations, 
utilization  was  increased  by  five  to  six  percent, 
though  the  security  checkpoint  utilization  was 
decreased.  So  even  though  the  wait  times 
were  increased,  the  resource  usage  was  also 
increased.  By  dividing  the  waiting  lines,  table 
6 shows  that  resource  usage  was  cut  by  37- 
39  percent  for  the  check-in  stations.  This 
means  that  the  workers  were  only  working  for 
9-10  hours  of  the  24  when  the  waiting  lines 
were  split  in  two. 

Table  4 
Wait  Time 


Airline/Checkpoint 

Average 

Min 

Max 

American_Continental 

manned 

2.51 

0.00 

33,28 

self  check-in 

2.71 

0.00 

56.57 

manned  2 additional 

1.38 

0.00 

33.28 

self  check-in  10 

additional 

0.83 

0.00 

33.28 

Southwest 

manned 

6.74 

0.00 

67.72 

self  check-in 

6.60 

0.00 

75.26 

manned  2 additional 

25.13*** 

0.00 

33.28 

self  check-in  10 

additional 

34,05*** 

0.00 

33,28 

USAir 

manned 

2.64 

0.00 

33.92 

self  check-in 

12.07 

0.00 

105.48 

manned  2 additional 

24.63*** 

0.00 

33.28 

self  check-in  10 

additional 

34.45*** 

0.00 

33.28 

***p<.Q01 

Wofe.'time  in  minutes 


Tabie  5 
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Wait  Time 


Airline/Checkpoint 

Average 

American_Continental 

manned 

2.51 

self  check’in 

2.71 

Southwest 

manned 

6.74 

self  check-in 

6.60 

manned  2 lines 

0 19*** 

self  check-in  2 lines 

0.088*** 

USAir 

manned 

2.64 

self  check-in 

12.07 

manned  2 lines 

0.06** 

self  check-in  2 lines 

0,14*** 

Wote:  time  in  minutes 

**p<.01,  ***p<.001 

Table  6 

Instantaneous  Utilization 

Type  of  Service 

Average 

Two  Lines 

Manned  Check-in 

40.70% 

Self  Check-in 

43.89% 

Security  Checkpoint 

82.32% 

plus  10  Self  Check-in 

Manned  Check-in 

85.39% 

Self  Check-in 

89.11% 

Security  Checkpoint 

79.69% 

5.0  DISCUSSION  AND  IMPLICATIONS 
FOR  OPTIMIZING  PASSENGER  FLOW 

Airplanes  move  a large  percentage  of  the 
population  - about  580  million  passengers  just 
in  the  year  2008  in  the  US  [8],  When  there  is 
a procedural  failure  in  any  one  section  of  an 
airport  it  can  have  a drastic  effect  on  the  entire 
air  transportation  system.  Therefore,  the 
primary  application  of  this  simulation  is  to 
assist  in  optimizing  traffic  flow  within  airports. 
The  results  from  the  simulation  runs  indicate 
that  the  chokepoint  at  Norfolk  International 
Airport  resides  with  the  initial  ticketing  and 
baggage  checkpoints. 


Southwest  and  USAirways  are  the  primary 
carriers  that  can  take  a number  of  actions  to 
try  and  reduce  the  waiting  time  associated 
with  check-in.  One  possible  method  of 
redressal  is  to  increase  the  number  of  self- 
check-in stations  so  that  more  people  can  use 
them  at  once.  Another  option  is  to  do  a 
usability  analysis  on  the  self-check-in  station 
to  make  sure  the  process  is  smooth,  efficient, 
and  easy  for  inexperienced  travelers  to  use. 
Finally,  more  workers  could  be  brought  in  to 
help  the  passengers  check-in. 

The  maximum  utility  of  the  airport  model  is 
that  the  effects  of  these  changes  can  be 
tested  in  the  simulation  before  changes  in  the 
system  can  be  made.  The  number  of  self- 
check-in stations  and  manned  stations  can  be 
repeatedly  adjusted  and  the  wait  times  can  be 
analyzed  to  see  what  the  optimum  number  is. 
The  effects  of  failures  and  emergencies  can 
also  be  examined  within  the  model. 

For  emergency  planning  and  error  redressal, 
the  ultimate  goal  is  to  try  and  plan  for  future 
events  by  using  past  experience  [9,10].  As 
described  above,  our  model  allows  for  the 
quantification  of  each  contingency  situation 
into  a discrete  variable.  These  discrete 
variables  include  passenger  behaviors  that 
can  be  quantified  to  create  individual  ‘agents’ 
that  exhibit  different  behaviors  at  different 
points  in  time.  Each  variable  is  then  built  into 
the  simulation  as  described  to  ultimately 
predict  the  parameters  required  for  optimal 
rate  of  passenger  flow  inside  an  airport.  All  of 
these  ideas  will  be  done  in  future  testing  of  the 
simulation  model. 


6.0  CONCLUSIONS 

The  simulation  model  has  Indicated  that  there 
are  choke  points  within  the  Norfolk 
International  Airport.  Those  choke  points  are 
the  check-in  stations  where  passengers  check 
their  luggage.  We  recommend  that  the 
airlines  in  charge  of  the  specific  stations 
should  decrease  the  wait  time  by  increasing 
the  number  of  staff  and/or  increasing  the 
number  of  self-check-in  stations.  Besides  to 
the  obvious  economic  advantages  of 
regulating  passenger  flow,  minimizing  choke 
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points  will  also  ensure  fewer  instances  of 
confusion  and  crowding  at  airports  thereby 
strengthening  the  degree  of  passenger 
security  to  a large  extent. 


[10]  Anderson,  E.,  (2003)  “Be  prepared  for  the 
unforeseen,"  Journal  of  Contingencies  and 
Crisis  Management,  11, 129-131 
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4.2  Integrating  Advanced  Airspace  System  Components  in  a NAS-Wide 
Simulation 


MODSIM  WORLD 


October  13-15,  2010 
Hampton.  Virginia 


Integrating  Advanced  Airspace 
System  Components  in  a NAS- 
Wide  Simulation 

Patricia  Glaab 

NASA  Langley  Research  Center 


October  14,  2010 


(F^MODSIM  WORLD  October  1 3-15,  201 0 

f » 1 conrerftrtfre  tk  Ejipo  Hampton,  Vi  rgin  ia 

Agenda 

• Organization  and  programs  supported 

• NAS-wide  simulation  for  systems  analysis 

• ACES  simulation  quick  overview 

• Enhancements  for  new  capabilities 

• Demonstration  videos 

• Future  research  possibilities 
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WORLD 
Conietienco  a.  tips 


October  13-15,2010 
Hampton,  Virginia 


Organization 

• Aeronautics  Systems  Analysis  Branch  (ASAB), 
NASA  Langley  Research  Center 

• Aircraft  and  airspace  system  concept  analysis 
— Both  customer  supplied  and  internally  defined 

— Identification  of  promising  new  technologies 
— Support  agency's  strategic  research  planning 

— Support  competitive  aerospace  proposal  generation 
and  evaluation 

— Use  and  advance  an  integrated  suite  of  tools  to 
conduct  this  analysis 


WORLD  October  13-1 S,  2010 

i w 1 conieMinc*  i Expo  Hampton.  Virginia 

ASAB  Support  for  NextGen 

• NextGen  time  frames 

- Near-term  - by  2012 

- Mid-term  - by  2018 

- Final  capabilities  - post  2025 

• ASAB  supports  far-term  goals 

- Assumes  advanced  airspace  management  tools 

- Highly  automated  decision  making 

• Research  areas 

- Demand/capacity/constraint  analysis 

- Metroplex  operations 
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WORLD 

f V ^ CtSfirerence  i Expo 


October  13-15.  2010 
Hampton,  Virginia 


NAS- Wide  Simulations 

• Systems  Analysis  for  NextGen  requires  capability 
to  model  at  National  Airspace  System  (NAS)  level 

• Focuses  on  overall  benefits,  rather  than  individual 
components  and  capabilities  of  a particular 
aircraft 

• Large  number  of  flights  modeled 

>FAA Terminal  Area  Forecast  (TAF)  report: 

• 30000  flights/d  ay  (current  day  avg,  cont.  US,  commercial) 

• > 40000  flights/day  - projected  for  2030 


WORLD  October  13-1 S, 2013 

4 Hampton.  Virginia 

NAS-Wide  Simulations 

• NASPAC  (FAA) 

"National  Airspace  System  Performance  Capability" 

• SIMMOD  (FAA) 

• PNP  (Sensis) 

"Probabilistic  NAS  Platform" 

• RAMS  - Eurocontrol  Experimental  Center 

"Reorganized  ATC  Mathematical  Simulator" 

• TAAM  (MITRE) 

"Total  Airport  and  Airspace  Model" 

• ACES  (NASA  Ames) 

"Airspace  Concepts  Evaluation  System" 

— Open  source 
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October  13-15,  2010 
Hampton,  Virginia 


jjr^MODSIM  WORLD 

^ ConlOfQOCfi  lb  Exi» 

ACES  Simulation  Overview 

• Developed  to  assess  system-wide  impacts  of 
airspace  technologies  and  operational 
concepts 

• Agent-based  simulation 

- Event-driven  components 

— Time-driven  components  (event  = time  step) 

• Provides  modeling  of  current  day  NAS 

• Extensible  (via  "plugins")  framework 


wo  D S I W WORLD  October  15-1 E,  2D1Q 

L jJ  Hampton , V]  rg  i nia 

ACES  Capabilities 

• Uses  Cybele  (lAI)  as  core  executive 

• Agents  in  ACES  map  to  real  world  entities  in 
the  National  Airspace  System 

- Flights 

- Airports 
-TRACON  ATC 
— En-route  ATC 

- Surveillance 

— Physical  layout  of  airspace  (sectors,  centers 
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WORLD 

['  ^ ^ C&nie(enc4  ittpo 


October  13-tS,  2Q1D 
Hampton^  Virginia 


ACES  Overview 


ATSP  agents 


NAS  User  agenis 


Atcscc  Agent 
ingestion  Alert  ! 

Update  Tx  Density  GDP  is 

GdpTimer  , 

' .1  ' ' ' gs'  — is’ 

GsTimer 

IS  UpdaleTKDensil^  j 

Airport  Tfm  Agent 

Tracon  Tfm  Agent 

Artec  Tfm  Agent 

IssueAarActivity 

1 mposeTfmOelay 

ImpaseTfmDelay  i 

Nod§t  1 

'1 

T"^ 

5f^  Airport  T!hi 
I Assessmenl 

15,  Update  Tfrn 
0|  Requiremente 

Gro  und  dal  ay 
program  and  Ground 
stop  bahaviors  ara 
not  currently 


AOC  Agent 
TacticalFItControl 

_L 

is[^  Tactical 


Flight  Agent 


Airport  Ate  Agent 

Queuing  Activity 

“N0M  1 rainwiijr" 

: ^ I 

CraaleQueue 


Artec  Ate  Agent 
Flight  Date  Distribution  Activity 


Tracon  Ate  Agent 
[ AssignTransitTime 


1 TransilTime 


Artec  Ate  Agent 
Create  I ni  l Flights 

ExecuteCDRDelay 

sfOl  Maintain  MIT  [ 
1 AAC 


i Pitot 

Physics 

[ iipB9  Fti^  l>ol»liik)daiaf  | 

I 1 |i^  Physics 


Tha  Advancad 
Aifspaca  Concapt  is 
drivan  by  a separata 
periodic  timer 


CDRL 18  ACES  Programmer's  Guide,  Rev  2,  p 13. 
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Conlerence  £k»o 


ACES  Visualization 


October  12.^1  S,  2010 
Hampton,  Virginia 


CDRL  1?  Atrigace Concept  Evsluation  System  (ACES)  User  Guide,  Version  4,  p.  92. 
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October  2010 
Karri  frton^  Viroinia 


ACES  Demonstration 


(Video  of  ACES  visualization  window  running  a 
typical  simulation  scenario  with  midday  traffic 
volume) 


WOULD  October  10^15,2010 

Confef«nce  s.  txpo  Ham  plon,  Virginia 

ACES  Viewer 

• ACES  support  tool  for  post-run  visualization 

• Runs  using  IV4D 

- Built  for  Air  Force  Research  Labs  by  Aerospace 
Computing,  Inc  (ACI) 

• Visualizes  anything  with  lat/long/alt/time 
points 

• Extended  to  support  ACES  output  style 
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ACES  Viewer  Demo 


October  13-15,  2010 
Hompton,  Virginia 


IMODSIM  WORLD 

T Cenlofemee  & tjj>3 


Video  of  previous  ACES  demo 
video,  now  run  in  ACES 
Viewer  with  3-D  view 
rotated  and  manipulated 


ACES  Enhancements 


October  1 3-1 S,  2010 
Hampton,  Virginia 


• ACES  provides  a powerful  framework,  but  must 
be  extended  for  new  concept  testing 

- Merging  and  Spacing  (M&S)  in  the  airport  vicinity 

- Conflict  Detection  and  Resolution  (CD&R) 

• Tactical 

- State-based 

- Prevent  impending  (<  2 minute)  loss  of  separation  (LOS) 

• Strategic 

- Intent-based 

— Prevent  future  (10-20  minutes  out)  LOS  event 

• Default  ACES  cannot  support  this  type  of  study 
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MOOSIM  WORLD 

COrtf*f(nlco  & 


October  13-15, 2010 
Hampton.  Virginia 


ACES  Capabilities 


• CD&RinACES 

-Tactical  only 

- Based  on 
NAS  Center 
boundaries 

— Very  limited 
capability 


Nearby  aircraft  across  Center 
boundary  ignored 


mOD%\M  WORLD 
Conlerenco  fl.  £*O0 


ACES  Capabilities 


October  13-1S,  2010 
Hampton,  Virginia 


• NoM&SinACES 

- Default  TG  is  MPAST 

- MPAST  does  not 
model  trajectory 
between  arrival/ 
departure  fix  and 
airport 

(Node/  Queuing 
model) 
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ACES  Enhancements 


October  13-15,  2010 
Hampton,  Vtrgirtia 


MOOSIM  WORLD 

Cotileffinco  A,  Eilpo 


• NASA  Langley  contracted  software  development  for 
prototype 

• Intelligent  Automation,  Inc.  (lAI) 

— ACES  development  team  member 

— M&S  concept  developed  in  previous  initiative 
— CD&R  (tactical)  developed  in  previous  initiative 

• Expanded  CD&R 

- ACCORD  (tactical)  - NASA  LaRC,  NIA 
— Stratway  (intent)  - NASA  LaRC 

• M&S 

— Refinement  of  lAI  concept  design 
— Multi-Point  Scheduling  Algorithm  - NASA  ARC 


WORLD  October  13^1S,  2010 

i “ Conieref>ce  e.  Ham  pton,  Vi  rgi  nia 

Current  Status  - M&S 

• Two  airports  with  detailed  databases 

-Atlanta  Hartsfield  (KATL) 

- Dallas/Fort  Worth  (KDFW) 

• M&S  development  complete 

• Testing  mostly  complete 

• Demonstration  of  full  system  in  progress 
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MODSIM  WORLD 

Conteffl-npce  5 £j(PO 


Current  Status  - M&S 


October  13^15, 2010 
Hampton.  Virginia 


Video  of  ACES  simulation 
run  with  M&S  running 
traffic  to  KATL 


^ifAODSlfA  WORLD  October  1 3-1 S. 2010 

Confflfflnce  s.  fepo  Ham  pton,  Virginia 

Current  Status  - CD&R 

• Implementation  complete  for  tactical  and 
strategic  CD&R 

• Work  on-going  with  CD&R  Stratway  and 
ACCoRD  team  to  provide  feedback  for 
continued  tool  development 

• Integration  with  M&S  completed 

• Testing  mostly  complete 

• Demonstration  of  full  system  (M&S  with 
CD&R)  in  development 


20 


487 


ti^MODSIM  WORLD 
f ^ ^ CWlOfQ«nCft  & EXOO 


October  13-15,  2010 
Hampton,  Virginia 


Current  Status  - CD&R 


Video  of  ACES 
simulation  running 
with  strategic  CD&R 
enhancements 


WORLD  Ocloter  13-1S,  2010 

i Conief ence  & Expo  Ham  plon.  Virginia 

Future  Research  Possibilities 

• Quantification  of  airport  throughput  as  a 
function  of  aircraft  spacing  (R.  Brown;  2010) 

• Arrival  routing  concept  development  to 
improve  airport  throughput 

• Effect  of  CD&R  maneuver  strategies  on  system 
delay  and  fuel  efficiency 

• Impact  of  CD&R  on  M&S  efficiency  and 
robustness 
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MODSIM  WORLD 
Conrefoncc  s.  Cxpo 


Questions/Discussion 


October  13^1  £p  3010 
Hamptorip  VlrginJa 
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[^MODSIM  WORLD 
^ ^ ^ Confefertce  * Expo 

Backup  Slides 


October  13^15,2010 
Hampton,  Virginia 
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MODSIM  WORLD 

ConfefflTice  £xj» 


October  13^15, 2010 
Hampton,  Virginia 


(Backup  Slide)  NAS  Flights  Estimates 

FAA's  Terminal  Area  Forecast,  2010,  page  18: 


2008  (last  histone  data  available) 

Yearly  National  Total  Commercial  (takeoffs  and  landings)  27951930 

(includes  Alaska  and  West  Indies) 

Yearly  Alaska  ^937116 

Yearly  Western  Pacific  -4899428 

Yearly  Continental  US  (takeoffs  and  landings)  22115386 

Daily  Flights  (yearly  operations/2  ops  per  flight/365  days)  30295 

2030  (Projected  Data) 

Daily  Flights  ((36646248  NT  - 1059046  AK  - 6113579  WP)/365/2)  40375 
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Abstract.  The  successful  implementation  of  the  next  generation  infrastructure  systems  requires  solid  understanding  of  their 
technical,  social,  political  and  economic  aspects  along  with  their  interactions.  The  lack  of  historical  data  that  relate  to  the  long-term 
planning  of  complex  systems  introduces  unique  challenges  for  decision  makers  and  involved  stakeholders  which  in  turn  result  in 
unsustainable  systems.  Also,  the  need  to  understand  the  infrastructure  at  the  societal  level  and  capture  the  interaction  between 
multiple  stakeholders  becomes  important.  This  paper  proposes  a methodoiogy  in  order  to  develop  a hoiistic  approach  aiming  to 
provide  an  alternative  subject-matter  expert  (SME)  elicitation  and  data  coiiection  method  for  future  sociotechnicai  systems.  The 
methodoiogy  is  adapted  to  Next  Generation  Air  Transportation  System  (NextGen)  decision  making  environment  in  order  to 
demonstrate  the  benefits  of  this  holistic  approach. 


1.0  INTRODUCTION 

The  world  is  heavily  dependent  on  various 
critical  infrastructures  in  areas  like 
transportation,  power,  communication, 
water,  energy,  etc.  Today’s  critical 
infrastructures  are  large-scale  socio- 
technicai systems,  comprised  of  multiple 
components,  involving  various  stakeholders, 
technologies,  policies  and  social  factors  [1], 
In  the  recent  years,  numerous 
sociotechnicai  systems  started  to  undergo  a 
series  of  transitions.  The  definition  of 
system  transition  is  given  as  “a  long-term 
fundamental  change  (irreversible,  high- 
impact  and  of  high-magnitude)  in  the 
cultures  (mental  maps,  perceptions), 
structures  (institutions,  infrastructures  and 
markets),  and  practices  (use  of  resources) 
of  a societal  system”  [1],  In  other  words,  the 
transition  includes  “a  structural  change  in 
both  technical  and  social  subsystems”  [2], 

The  planning  and  implementation  phases  of 
such  large-scale  infrastructure  transitions 
require  close  monitoring  of  performance 
parameters  like  safety,  efficiency,  and 
sustainability.  Ensuring  that  infrastructure 
transition  reveals  a safer  and  more 
sustainable  system  became  a major 
challenge  for  the  society  [3],  In  order  to  do 
so,  decision  makers  often  need  to  test 
various  strategies  and  perform  analyses  to 


characterize  risk  and  other  parameters. 
However  past  strategies  and  historical  data 
regarding  previous  infrastructure  systems 
are  no  longer  adequate  for  next  generation 
infrastructure  system  design  due  to  (1) 
previous  systems  evolved  via  incremental 
changes  and  system  improvements  which 
lead  them  to  be  unsustainable  (i.e, 
congestion,  energy  shortage,  air 
transportation  delays,  etc.)  and  (2)  previous 
infrastructures  were  made  to  last,  robust  but 
resistant  to  change  [1 , 4].  The  lack  of 
empirical  data  causes  decision  makers  to 
heavily  rely  on  expert  opinions  for  next 
generation  infrastructure  planning. 

1.1.1  Lack  of  Data  and  Expert 
Elicitation 

The  future  status  of  man-made  systems  like 
energy,  transportation,  warfare,  agriculture, 
and  other  infrastructure  cannot  be  predicted 
over  a prolonged  time  frame.  Large-scale 
sociotechnicai  systems  are  made  up  of 
multiple  components  that  involve  numerous 
stakeholders,  technologies,  policies,  and 
social  factors  [1].  Decision  makers  and 
policy  makers  often  require  expert  opinion 
to  comprehend  the  complexity  and 
uncertainties  within  such  systems.  Expert 
elicitation  methods  typically  have  been  used 
to  obtain  the  necessary  data  for  reliability 
and  risk  studies  for  these  types  of 
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technological,  environmental,  and 
socioeconomic  issues  [5,  6],  Furthermore, 
NextGen  will  inherently  be  different  from  the 
current  system.  However,  the  ability  to 
predict  the  future  remains  limited  owing  to 
the  long-term  implementation  phase  and  the 
large  number  of  uncertainties  [5],  “There 
are  no  data  about  the  future  on  which  to 
rely.  We  are  challenged  to  imagine  many 
different  and  possible  ‘futures’  as 
humankind  seeks  to  exert  its  mastery  and 
control”  [7].  The  crucial  task  is  to  think 
innovatively  and  recognize  the  creative  and 
imaginative  capacities  of  each  stakeholder. 
The  overall  goal  of  the  methodology  is  to 
reduce  complexity  and  uncertainty  while 
inventing  the  future  and  analyzing  the 
respective  risk  for  each  alternative  scenario 
[7]. 

1.1.2  Sociotechnical  Complexity 

The  system-wide  upgrade  of  complex 
systems  is  a challenging  undertaking  [8]. 
The  increased  complexity  adds  to  the 
diversity  of  decision  maker’s  system 
interpretations  that  can  directly  alter  the 
overall  system  operation  and  the  decision- 
making processes.  Brewer  [7]  states  that 
real-world  problems  do  not  exist 
independently  of  their  socio-cultural, 
political,  economic,  and  physiological 
content,  and  for  that  reason,  an  approach 
with  multiple  perspectives  and  multiple 
disciplines  is  necessary  to  efficiently  clarify 
the  matter  at  hand  (which  is  quite 
challenging  in  practice).  The  presence  of 
multiple  actors  with  frequently  divergent  and 
conflicting  interests  can  turn  large-scale 
infrastructure  transitions  into  wicked 
problems  [9].  Traditional  policy  and  market 
practices  have  proven  ineffective  in  dealing 
with  problems  with  a high  degree  of 
uncertainty  regarding  future  scenarios  and 
actor  interactions  [1].  For  this  reason, 
creating  a methodology  that  attempts  to 
predict  decision  pathways  for  future 
systems  while  accounting  for  the  technical, 
organizational,  and  contextual  complexity  of 
the  system  is  necessary  [4,  7]. 


2.0  TEST  CASE 

The  goal  of  this  research  is  to  develop  a 
methodology  that  will  serve  as  an  aid  for 
decision  makers  who  are  responsible  for 
designing  and  evaluating  scenarios  for 
future  technological  implementations  within 
next  generation  infrastructure  systems.  As 
previously  mentioned,  the  implementation  of 
large  system  transitions  require 
understanding  of  the  multi-layer  complexity 
and  overcoming  the  lack  of  experimental 
data  for  designing  the  future  phases  of  the 
system.  In  order  to  demonstrate  the 
proposed  methodology,  the  planning, 
development,  and  implementation  of  the 
Next  Generation  Air  Transportation  System 
(NextGen)  is  used  as  a test  case.  The 
following  sections  will  provide  insights  on 
NextGen. 

2.1  NextGen  Overview 

The  U.S.  National  Airspace  System  (NAS) 
is  made  up  of  a number  of  multifaceted 
elements,  including  over  800  billion 
passengers  and  input  from  more  than 

15.000  air  traffic  controllers  to  assist 

590.000  pilots  onboard  239,000  aircraft  that 
take  off  and  land  at  20,000  U.S.  airports. 
This  extremely  complex  system  is  closely 
tied  to  the  national  economy,  contributing 
$1 .2  trillion  annually  and  over  5 percent  of 
the  gross  domestic  product  while  generating 
11  million  jobs  and  $369  billion  in  earnings 
[10]. 

The  delays  that  currently  impact  passenger 
travel  are  forecasted  to  be  even  higher  in 
the  future  as  the  demand  for  air 
transportation  is  expected  to  increase.  In 
addition,  future  airspace  is  expected  to 
accommodate  unmanned  aircraft  systems 
and  commercial  space  vehicles  as  well. 
Furthermore,  the  entire  system  is  expected 
to  operate  within  acceptable  safety  levels 
and  environmental  impact  guidelines  [10]. 

To  respond  to  this  forecasted  increase  in 
demand,  the  Joint  Planning  and 
Development  Office  (JPDO)  was  formed 
during  the  Bush  Administration  in  2003.  This 
organization  is  a partnership  between  public 
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and  private  stakeholders,  including  the 
Federal  Aviation  Administration  (FAA),  the 
Department  of  Defense  (DoD),  the 
Department  of  Homeland  Security  (DHS), 
NASA,  and  others  [11],  TheJPDOis 
charged  with  developing  concepts, 
architectures,  roadmaps,  and 
implementation  plans  for  transforming  the 
current  national  Air  Transportation  System 
(ATS)  into  NextGen. 

“During  the  next  two  decades,  demand  will 
increase,  creating  a need  for  a system  that 
(1)  can  provide  two  to  three  times  the 
current  air  vehicle  operations;  (2)  is  agile 
enough  to  accommodate  a changing  fleet 
that  includes  very  light  jets  (VLJs), 
unmanned  aircraft  systems  (UASs),  and 
space  vehicles;  (3)  addresses  security  and 
national  defense  requirements;  and  (4)  can 
ensure  that  aviation  remains  an 
economically  viable  industry”  [8], 

2.2  NextGen  Challenges 

The  complex  nature  of  the  NAS,  combined 
with  numerous  operational  and 
management  challenges,  threatens  the 
NextGen  efforts.  Reports  from  the  Office  of 
the  Inspector  General  (OIG)  reveal  that  the 
Federal  Aviation  Administration  (FAA)  is 
facing  difficulties  in  developing  a strategy  to 
engage  stakeholders,  not  to  mention 
managing  and  integrating  multiple  NextGen 
efforts  [12],  Uncertainties  and  a lack  of  data 
related  to  shaping  a future  aviation  system 
also  inhibit  the  ability  to  employ  formal  risk 
analysis  methods.  As  a result,  SME  opinion 
has  become  the  primary  source  of  input  for 
the  NextGen  scenarios,  technologies,  and 
safety. 

2.3  Need  for  a Methodology 

In  the  past,  traditional  engineering  design 
approaches  focused  primarily  on  the 
technical  requirements.  Similarly,  traditional 
infrastructure  designs  were  treated  like 
traditional  engineering  problems,  causing 
them  to  be  brittle  and  resistant  to 
modernization.  However,  the  next 
generation  infrastructures  must  be  treated 
differently  because  of  their  complex  nature 


[13].  The  need  to  understand  the 
infrastructure  at  the  societal  level  and 
capture  the  interaction  between  the 
technical,  political,  and  economic  factors 
becomes  more  important  [4].  Traditional 
engineering  design  methods  are  used  in 
concert  with  serious  gaming  approaches  to 
create  a holistic  decision-making 
methodology.  The  goal  of  the  proposed 
methodology  is  to  enable  decision  makers 
and  researchers  to  gather  information  in 
regard  to  NextGen  safety  values.  Toward 
that  purpose,  various  tools  and  techniques 
are  employed  collectively  here  to  create  a 
methodology  that  can  be  used  as  an 
alternative  to  conventional  expert  elicitation 
techniques  (i.e.  Delphi  Method,  Nominal 
Group  Technique,  brainstorming,  etc.)  for 
complex  systems  with  multiple 
stakeholders. 

3.0  PROPOSED  METHODOLOGY 

As  discussed  above,  the  methodology  for 
estimating  risks  within  a future  system  will 
combine  various  approaches.  Because  the 
air  transportation  system  includes  extensive 
interactions  between  multiple  stakeholders, 
which  can  be  difficult  to  track,  and  because 
of  the  lack  of  historical  data,  SMEs  from 
diverse  backgrounds  are  the  main  source  of 
data  for  this  study  [6].  Aviation  safety  within 
NextGen  is  measured  by  using  the 
probability  number  method.  Conventional 
numerical  and  qualitative  expert  elicitation 
techniques  provide  the  gaming  data  that  are 
necessary  to  construct  the  scenarios, 
alternatives,  attributes,  and  so  on. 
Commercial-off-the-shelf  software  packages 
(i.e..  Logical  Decisions  for  Windows®  and 
Precision  Tree®  by  Palisade  Corp.)  are  also 
used  to  rank  future  technologies  in  order  to 
support  SME  opinions  before  and  during  the 
gaming  cycle.  Fig.  1 provides  an  overview 
of  the  methodology.  The  various  tools  and 
techniques  are  described  in  more  detail  in 
subsequent  sections. 
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Figure  1.  Serious  gaming  methodology:  high-level 
architecture 

3.1  Serious  Gaming  and 
Infrastructure  Design 

The  use  of  gaming  and  simulation 
techniques  as  a formal  approach  to  strategy 
making  has  gained  wide  acceptance,  as 
evidenced  by  the  frequency  of  occurrence 
within  mainstream  strategy  literature  [14], 

Gaming  methods  (or  soft  systems  thinking) 
have  become  an  alternative  to  formal 
complexity  modeling  techniques  like 
systems  dynamics  and  operations  research. 
These  techniques  have  been  successfully 
applied  to  well-structured  problems; 
however,  when  employed  on  ambiguous 
and  often  ill-structured  and  complex 
systems,  their  contribution  has  been  limited 
because  adequate  theory  and  empirical 
data  were  absent  [14], 

Serious  gaming  methods  are  able  to  provide 
decision  makers  with  an  environment  in 
which  the  totality  of  the  system  and  its 
dynamics  are  present.  With  a holistic 
approach  that  includes  the  wide-range  of 
perspectives,  skills,  information,  and  mental 
models  of  the  involved  parties,  the  quality  of 
the  decision-making  environment  increases 
dramatically  [13, 14],  Unlike  hard-system 
methods,  the  gaming  and  simulation 
approach  is  quite  flexible  and  easily 
adaptable  to  other  quantitative  methods, 
scenarios,  and  computer  models  [16], 

Policy  gaming  methods  can  help  both 
participants  and  modelers  understand  the 
big  picture  and  identify  critical  elements  of 


the  complex  problem  at  hand.  Because  of 
the  iterative  and  experimental  nature  of 
these  gaming  and  simulation  environments, 
participants  are  able  to  test  different 
approaches  within  both  a safe  environment 
and  a condensed  timeframe  [15], 

According  to  Duke  [17],  a typical  complex 
real-world  situation  has  the  following 
characteristics;  it  contains  numerous 
variables  in  interaction;  no  realistic  basis 
exists  for  quantification  of  these  variables  or 
their  interactions;  and  no  proven  conceptual 
model  or  precedent  exists  on  which  action 
decisions  can  be  based.  Complex  systems 
are  also  typified  by  a sociopolitical  context 
of  decision-making,  where  the  actions  of  the 
various  “players”  may  be  idiosyncratic  or 
irrational;  furthermore,  the  decisions  are 
irreversible  and  the  results  are  not  generally 
fully  understood  until  well  into  the  future 
[17],  NextGen  fits  this  model,  as  a complex 
real-world  air-transportation  system  that  will 
undergo  a full-scale  transformation  and  that 
contains  numerous  stakeholders  with  often 
conflicting  agendas,  including  those  of  the 
general  public  [18].  The  gaming  context 
may  help  capture  the  organizational  and 
behavioral  dynamics  of  the  decision-making 
process  and  ultimately  yield  a more  realistic 
problem  solution. 

The  following  section  provides  insight  on  the 
probability  number  method  (PNM),  which  is 
used  as  the  backbone  for  the  NextGen  risk 
calculation  method. 

3.2  Probability  Number  Method 

The  PNM  was  created  through  a joint  effort 
between  the  International  Atomic  Energy 
Agency  (IAEA)  and  several  United  Nations 
organizations.  The  method  was  developed 
as  an  affordable  solution  to  quickly 
determine  the  risks  that  are  associated  with 
handling,  storing,  processing,  and 
transporting  hazardous  materials.  The 
methodology  is  supported  by  an  extensive 
database  that  includes  the  various  factors 
that  impact  the  risks,  including  types  of 
substances  (i.e.,  flammable,  toxic,  or 
explosive  gases  or  liquids),  safety 
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precautions,  population  densities, 
environmental  factors,  and  so  on  [19],  The 
average  accident  scenario  contains  only 
rough  estimates  because  the  purpose  of  the 
methodology  is  to  serve  as  a decision- 
making aid  that  enables  risk  ranking  and 
prioritization  for  further  analysis.  Dr.  Adrian 
Gheorghe  was  a part  of  the  development 
team  for  the  PNM  and  brings  his  expertise 
to  decoding,  modifying,  and  adapting  the 
probability  number  method  to  the  NextGen 
system. 

3.2.1  Probability  Number 

Within  the  PNM,  the  probability  of  the 
occurrence  of  a certain  accident  is 
calculated  via  a dimensionless  probability 
number  N,  which  is  then  transformed  into  an 
actual  probability.  The  probability  number 
can  be  adjusted  or  updated  based  on 
various  correcting  factors.  The  risk  is 
defined  as  the  product  of  the  consequences 
and  the  probabilities  of  unwanted  outcomes 
(i.e.,  hazardous  events). 

3.2.2  Adapted  Consequences  and 
Probabilities 

The  PNM  defines  risk  as  the  product  of  the 
probability  of  an  accident  and  its  respective 
consequences,  calculated  separately. 

Wthin  the  NextGen  framework,  the 
consequences  are  defined  as  fatal  aviation 
accidents  (i.e.,  accidents  per  100,000  flight 
hours).  The  probability  of  an  accident  that 
involves  a passenger  fatality  is  calculated  in 
the  following  manner.  An  average 
probability  number  that  represents  the  base 
assumption  is  determined;  then,  this 
number  is  adjusted  by  using  correcting 
factors.  These  factors  represent 
technological  improvements  and  other 
enablers  that  are  planned  within  the 
NextGen  framework,  namely,  runway 
safety,  aircraft  reliability,  icing,  turbulence 
mitigation,  weather,  and  airborne  collision 
avoidance.  The  adjusted  accident 
probability  A/  is  then  converted  into  a 
frequency  of  occurrence. 


The  probability  of  occurrence  and  the 
consequence  factors  are  inputted  into  the 
FAA’s  Risk  Matrix  (Fig.  2).  The  initial 
conditions  for  the  risk  are  determined  by  the 
averaged  accident  data  (2000-2009)  which 
are  obtained  from  NTSB  website''  . The 
average  severity  of  aviation  accidents  is 
0.291  fatalities/100,000  flight-hours  or 
severity  classification  “Minor,  4";  meanwhile 
the  probability  of  such  an  accident  is 
0.208/100,000  flight-hours,  indicating  a 
“Remote,  C"  likelihood  category.  Departing 
from  the  values  above,  the  current  aviation 
risk  is  determined  as  “Low  Risk,  green”. 


' Unacoplatlt  nitfi  SiTiG^ 
PMfll  sind/or  Cmvnon 


Figure  2.  FAA  risk  matrix. 

The  PNM  was  chosen  to  be  the  backbone 
for  the  risk  estimation  engine  in  this 
application  as  a result  of  its  intuitive 
structure  and  ease  of  expandability.  The 
main  components  of  the  PNM,  namely,  the 
consequences,  probabilities,  and  risk 
outcomes,  are  incorporated  into  the 
NextGen  Safety  Assessment  Methodology 
and  fused  with  the  policy  gaming  effort. 

3.3  Software  Add-Ons 

The  selected  gaming  platform  enables  the 
integration  of  additional  methods  and 
techniques,  which  allows  the  methodology 
to  remain  flexible  and  expandable.  This  in 
turn  ensures  a more  thorough 


^ httD://www.  ntsb.aov/aviation/Stats.htm 
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representation  of  the  air  transportation 
infrastructure.  Two  commercial-off-the-shelf 
software  solutions  have  been  embedded 
into  the  methodology  to  enhance  the 
ranking  and  prioritization  of  enablers  and 
other  damage  indicators.  The  Logical 
Decisions  for  Windows®  software  helps  the 
prioritization  and  ranking  of  NextGen  safety- 
related  technological  enablers  with  respect 
to  their  benefits,  costs,  implementation 
timelines,  and  other  parameters.  The 
Precision  Tree®  software  is  used  to  collect 
and  further  analyze  the  data  that  are 
obtained  from  the  gaming  exercise. 


Figure  3.  NextGen  safety  assessment  gaming 
sequence. 


4.0  GAMING  SEQUENCE 

The  NextGen  safety  assessment 
methodology  was  developed  on  a serious 
gaming  platform  that  was  adopted  from  the 
play  sequence  of  policy  gaming  developed 
by  Geurts,  Duke,  and  Vermeulen  [14].  An 
adapted  version  of  the  play  sequence, 
which  accommodates  the  NextGen  safety 
framework,  is  given  in  Fig.  3.  The  sequence 
is  initiated  by  the  presentation  of  the  game 
to  the  stakeholders;  this  includes  providing 
the  game  rules,  a general  overview  of 
NextGen  goals,  and  the  available 
resources.  Stakeholder  groups  that  contain 
participants  from  various  backgrounds  are 
formed,  and  their  respective  goals  are 
established  (e.g.,  the  FAA  is  concerned  with 
safety  goals,  commercial  airlines  with 
economic  goals,  and  so  on).  The  groups 
are  asked  to  evaluate  and  select  from  the 
list  of  technological  advancements  that  are 
related  to  the  improvement  of  safety. 
However,  the  implementation  of  each  of 
these  advancements  consumes  some  of  the 
predetermined  limited  resources. 
Stakeholders  with  conflicting  agendas  must 
come  to  a consensus  on  certain  decisions. 
Following  these  discussions,  the  decisions 
for  each  time  step  are  entered  into  the  risk 
simulation  mechanism  (based  on  the  PNM) 
and  the  updated  NAS  risk  values  are 
calculated  iteratively  for  the  next  three  time 
steps,  until  year  2025.  The  gaming  exercise 
is  concluded  with  the  debriefing  and 
discussions  in  order  to  create  the  foundation 
for  the  data  gathering  and  analysis. 


4.1  Stakeholders  and  Game  Rules 

One  of  the  most  productive  outcomes  of  the 
policy  gaming  exercise  is  that  the 
participants  are  able  to  interact  based  on 
the  problem  at  hand.  The  “safe” 
environment  allows  participants  to  create 
and  analyze  the  system  complexity  while 
communicating  various  aspects  of  the  issue 
among  the  stakeholders  [7].  In  order  to 
model  such  a dynamic  environment,  a 
simplified  list  of  involved  stakeholders  and 
engagement  rules  was  developed. 
Stakeholder  interactions  can  be  based  on 
rigid  rules,  free-form  rules  or  combination  of 
the  two.  Rigid,  rule-based  gaming  is  well 
suited  for  structured  environments,  such  as 
military  gaming  where  specific  rule  sets  that 
can  be  formalized  by  mathematical  or 
computational  methods  are  used.  However, 
for  social  arenas  that  include  both  public 
and  intense  stakeholder  interactions  without 
firm  rules,  free-form  gaming,  which  relies  on 
game  rules,  is  more  suitable.  Free-form 
games  enable  the  participants  to  challenge, 
modify,  and  improve  the  positions,  objects, 
and  rules  during  the  game  play.  However, 
the  process  must  be  carefully  monitored  by 
a control  team  of  experts  who  act  as 
referees  or  game  directors  [16]. 

Within  the  scope  of  this  project,  the  primary 
goal  was  to  provide  insight  into  the  future  of 
NAS  safety  and  data  gathering  in  regard  to 
future  systems.  Thus,  a combination  of  rigid 
and  free-form  gaming  rules  was  employed. 
The  goal  of  the  game  was  to  simulate  the 
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aviation  safety  values  within  a given  time 
frame,  while  taking  into  consideration 
technological  constraints  {i.e.,  cost  versus 
benefit,  feasibility,  mixed  equipage,  and  so 
on)  and  behavioral  concerns  (i.e., 
information  overload  to  pilots,  controllers, 
early  technology  adopters,  and  so  on). 

4.2  Outcomes 

Throughout  the  gaming  effort,  the 
discussions  and  negotiations  that  occurred 
between  participants  with  opposing 
agendas  were  important  and  can  be  used  to 
develop  different  problem-solving 
approaches.  The  gaming  exercise  serves 
both  as  an  individual  and  a collective 
learning  platform  for  the  stakeholders, 
leading  to  an  overall  elevated  level  of 
knowledge  across  the  system.  The 
individual  learning  took  place  during  the 
decision-making  process  during  which  each 
stakeholder  group  represents  their 
respective  point  of  view.  The  acquired 
awareness  that  was  gained  in  regard  to  the 
overall  system  complexity  will  ultimately 
improve  the  value  of  the  expert  elicitation. 
Furthermore,  the  presence  of  realistic 
interactions  among  the  players  yielded  data 
that  can  be  used  in  the  testing  and 
evaluation  of  NextGen- related  technologies 
in  the  future  [20],  In  addition  to  the 
individual  learning,  the  collective  learning 
(or  the  organizational  learning)  provided 
valuable  insight  that  relates  to  the  problems 
that  were  discussed  (i.e,,  NextGen  aviation 
safety  values). 

One  of  the  most  tangible  outcomes  of  the 
gaming  exercise  was  the  2025  NAS  safety 
values  with  respect  to  the  FAA’s  Risk  Matrix 
(Fig.  2)  acceptability  measures.  The 
intermediate  risk  values  that  were  obtained 
during  the  technology  implementation  phase 
(i.e.,  the  next  15  years)  were  elicited  under 
the  same  assumptions.  The  cumulative 
effect  of  various  safety-related  technological 
implementations  can  be  examined,  which 
will  enable  decision  makers  to  identify 
technologies  or  areas  that  require  further 
analysis  and  understanding. 


5.0  CONCLUSIONS 

The  planning  and  Implementation  of  next 
generation  infrastructure  transitions  are 
challenging  due  to  their  complex  nature  and 
the  lack  of  historical  data.  This  paper 
proposes  the  use  of  simulation  and  gaming 
methods  as  a platform  for  evaluating  and 
generating  necessary  data  for  designing 
future  infrastructure  systems.  The  Next 
Generation  Air  Transportation  System 
(NextGen)  decision  making  environment  is 
used  as  a test-bed  to  demonstrate  the 
developed  methodology. 

Subject-matter-expert  opinions  were  heavily 
relied  upon  to  develop  the  gaming 
components,  to  decide  on  the  participants, 
and  finally,  to  evaluate  the  validity  of  the 
framework.  Conventional  risk  calculation 
methods  and  commercial-off-the-shelf 
software  capabilities  were  integrated  to 
provide  system-level  overview  and  risk 
analysis  as  a decision-making  tool.  One  of 
the  most  prominent  contributions  of  the 
gaming  exercise  was  its  ability  to  aggregate 
the  perspectives  of  multiple  stakeholders 
with  varying  agendas,  while  calculating  the 
effectiveness  of  future  NextGen  safety 
enablers.  The  gaming  environment 
promotes  individual  and  collective  learning 
across  the  system,  allowing  subject-matter 
experts  to  express  their  opinions  for  a more 
thorough  and  accurate  modeling  of  the 
future  infrastructure, 
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Introduction 

Large  Scale  Critical  Infrastructures 

• Future  infrastructures  are  shaping  up  to  become  more  complex  and  more  interrelated  than  ever 
causing  older  systems  to  be  unsustainable,  faced  with  congestion,  energy  shortage,  blackouts, 
etc. 

- Transforming  today's  infrastructures  and  pianning  for  the  future  is  a chaiiengmg  task  since 

■ infrastructure  systems  are  inherentiy  robust  and  resistant  to  change 

■ We  lack  the  empirical  data  for  next  generation  infrastructure  pianning 

■ increased  muiti-level  compiexity  level  of  these  systems 


NsEkinal  Atrotiaulie»arvJ  Space  AtjUfunfstrition 


Ancel,  Giieotghe,  end  Jor»»  (2010) 


3 


Introduction  (cont’d) 

Lack  of  Data/Expert  Elicitation  Challenges 

■ Large  amount  of  complexity  and  uncertainty 

■ Expert  elicitation  methods  have  typically  been  used  to  obtain  the  necessary  data  for  reliability 
and  risk  studies  of  technological,  environmental  and  socioeconomic  issues 

■ Necessary  to  be  creative  about  possible  "futures" 
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Introduction  (cont’d) 

Sociotechnical  Complexity 

- Presence  of  multiple  actors  with  frequently  divergent  and  conflicting  interest 

• Traditional  policy  and  market  practices  are  proven  ineffective  in  dealing  with  high  degree 
uncertainty  caused  by  future  scenarios  and  actor  interactions 
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Problem  Definition 


• A methodology  capable  of  addressing  both  technical  and  social  aspects  of  the  system,  taking 
into  consideration  the  technical,  organizational,  and  contextual  complexity  of  the  system  in  a 
holistic  manner 

• Creating  a venue  to  understand  and  enhance  the  communication  between  multiple  stakeholders 
and  decision  makers 
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Test  Case 

National  Airspace  System  (NAS) 

• World’s  most  complex  aviation  system 

- Consists  of 

■ 15,000  air  traffic  controllers 
" 20,000  airports 

■ 800B  passengers 

■ About  50,000  flights/day 

- Contributing  $1 .2  trillion  to  the  economy  (5%  of  GDP) 
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Test  Case  (cont’d) 

NextGen  and  Its  Challenges 

• Current  NAS  capacity  is  no  longer  adequate  for  upcoming  demand  (i.e.  delays) 

• Next  Generation  Air  Transportation  System  (NextGen)  is  the  transformation  of  NAS  to  reflect 
2025  goals  and  requirements 

• Up  to  3 times  the  passenger  capacity 

■ Increased  safety  levels  while  reducing  the  environmental  impacts 

• FAA  is  facing  difficulties  with 

• Managing  efforts  in  an  integrated  way 

■ Engaging  private  sector  in  the  process 

• Coordinating  a multi-agency  approach 
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Proposed  Methodology 

• Components 

■ Conventional  risk  assessment  techniques 

■ Serious  gaming  methods 

- Objective  is  to  help  decision  making  process  for  System  of  Systems  {SoS)  level  infrastructures 
by  first  reaching  a common  understanding  of  the  system  complexity  by  all  stakeholders  and  then 
generate  knowledge  about  the  system  under  consideration 
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Serious  Gaming 

- Developed  during  the  1 950  and  1 960s 

- Initially  used  within  the  urban  studies,  political  science  and  business 

- The  goat  is  to  engage  participants  in  a safe  environment  in  order  to  ... 

...create  and  analyze  the  "futures'"  they  want  to  explore 
...pre-test  strategic  initiatives 

...deal  with  the  increasing  organizational  complexity  by  enhanced  communication 
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Serious  Gaming  (cont’d) 

• Serious  gaming  methods  became  an  alternative  to  formal  complexity  modeling  techniques 

■ Simulates  experimental,  rule-based  interactive  environments  for  players  to  learn  from  their 
actions 

> Flexible  and  easily  coupled  with  other  quantitative  methods,  scenarios  and  computer  models 

• Helps  participants  and  decision  makers  understand  and  communicate  complexity 
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Probability  Number  Method 

* Developed  with  the  collaboration  of 

■ International  Atomic  Energy  Agency  (IAEA), 

■ United  Nations  Environment  Programme  (UNED), 

» United  Nations  Industrial  Development  Organization  (UNIDO), 

■ World  Health  Organization  (WHO)  within  the  United  Nations 

* Dr,  Gheorghe  was  a part  of  the  Scientific  Secretariat 

* The  method  is  developed  to  determine  the  risks  associated  to  major  accidents  with  off^site 
consequences  in  fixed  installations  handling,  storing  and  processing  hazardous 
materials  {flammable,  toxic  or  explosive  gas  or  liquid)  or  the  transportation  of  such  materials 
by  road,  rail,  pipeline  and  inland  waterway 
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Probability  Number  Method  (cont’d) 

' Instead  of  calculating  the  Probability  values,  the  PNM  goes  with  determining  a number  that  will 
be  converted  into  the  probability  values  (e  g.  3x10  ''  accidents/year) 

• The  relationship  is  given  as 


« = |log„P 

> The  product  of  probabilities  and  consequences  (calculated  separately)  of  each  risk  category 
identifies  its  respective  risk 
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FAA  Risk  Matrix 
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Supportive  Add-Ons  (COTS  Software) 

* The  serious  gaming  exercise  is  supported  by  two  commercial-off-the-shelf  (COTS)  software 
packages 

Logical  Decisions  for  Windows 

■ Used  to  rank  and  prioritize  the  technological  and  other  enablers 

■ Provided  to  the  players  throughout  the  game 

■ Dynamic  ranking  helps  the  decision  making  process 

Precision  Tree 

■ Wit  I be  used  to  collect  and  further  analyze  the  data  extracted  from  the  gaming  exercise 
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Gaming  Sequence 


Scertario  Preserrtatiorr:  National  Airspace  Svstem  In 
2025 

-NextGen  Goals,  Resources  and  Game  Rules  are 
explained 


/^rocess(@  Time  = t)  

Step  1,  Event  “ Present  the  new  risk  information 


Step  2.  Stakeholder  Group  Meetings  - develop  new 
impfementatlon  plan 
Step  3.  Discussions  and  interactions^ 

Step  4.  Make  deCESions 
V^tep  5,  Process  the  decisions 
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The  Risk  Simulation  Mechanism 
Inputs:  Decisions^  correcting factors/1 

Updated  Probabilities  ^ Modified 
Updated  Consequences^^  PNM 
Updated  Risk  Matrix 

Outputs:  Updated  NAS  Risk,  scenarto(t-^i) 


hlalional  AsfdnautiCS  and  Space  AdmlnisliatiPn 


Ancel,  GI^Dfg}ie.  and  Jonee  (2010) 


17 


Outcomes 

• Holistic  approach  (i.e.  sociotech nica I system  as  a whole) 

• Individual  learning 

■ Learning  and  communicating  complexity  among  stakeholders 

■ Informed  participants/decision  makers  yielding  to  better  expert  elicitation 

• Collective  Learning 

■ Future  aviation  safety  values 

■ Decision  making  support  for  further  analysis 
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Conclusions 


• Planning  and  implementation  of  next  generation  infrastructure  transitions  are  challenging  due  to 
their  complex  nature  and  the  lack  of  historical  data 

* The  ability  to  aggregate  the  perspectives  of  multiple  stakeholders  with  varying  agendas,  while 
calculating  the  effectiveness  of  future  NextGen  safety  enablers  is  addressed 
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4.4  Consequence  and  Resilience  Modeling  for  Chemical  Supply  Chains 


2010-5149  C 

Consequence  and  Resilience  Modeling  for  Chemical  Supply  Chains 

Kevin  L.  Stamber,  Eric  D.  Vugrin,  Mark  A.  Ehlen,  Amy  C.  Sun,  Drake  E.  Warren,  and  Margaret  E.  Welk 

Sandia  National  Laboratories 

klstamb(3>sandia.ao\/  edvuQrij^sandia.aov  maehlen^sandia.Qov  acsun{S>.sandia.aov 
dewarre&sandia.aov  mewelk&.sandia.  gov 

The  U S.  chemical  sector  produces  more  than  70.000  chemicals  that  are  essential  material  inputs  to  critical  infrastructure  systems, 
such  as  the  energy,  public  health,  and  food  and  agriculture  sectors.  Disruptions  to  the  chemical  sector  can  potentially  cascade  to 
other  dependent  sectors,  resulting  in  serious  national  consequences.  To  address  this  concern,  the  U.S.  Department  of  Homeland 
Security  (DHS)  tasked  Sandia  National  Laboratories  to  develop  a predictive  consequence  modeling  and  simulation  capability  for 
global  chemical  supply  chains.  This  paper  describes  that  capability,  which  includes  a dynamic  supply  chain  simulation  platform 
called  N-ABLE™  The  paper  also  presents  results  from  a case  study  that  simulates  the  consequences  of  a Gulf  Coast  hurricane  on 
selected  segments  of  the  U.S.  chemical  sector  The  case  study  identified  consequences  that  include  impacted  chemical  facilities, 
cascading  impacts  to  other  parts  of  the  chemical  sector,  and  estimates  of  the  lengths  of  chemical  shortages  and  recovery.  Overall, 
these  simulation  results  can  DHS  prepare  for  and  respond  to  actual  disruptions. 


1.0  INTRODUCTION 

The  U.S,  Department  of  Homeland  Security 
(DHS)  [1]  has  identified  that  "protecting  and 
ensuring  the  continuity  of  the  critical 
infrastructure  and  key  resources  (CIKR)  of 
the  United  States  is  essential  to  the  Nation’s 
security,  public  health  and  safety,  economic 
vitality,  and  iway  of  life,”  The  chemical  sector 
serves  as  one  of  the  18  CIKR  sectors 
identified  by  DHS. 

Analysis  of  chemical  supply  chains  within 
this  context  is  an  inherently  complex  task, 
given  the  dependence  of  these  supply 
chains  on  multiple  CIKR  systems  (e.g., 
transportation,  energy).  This  effort  requires 
data  and  information  at  various  levels  of 
resolution,  ranging  from  network-level 
supply  chain  systems  to  individual  chemical 
reactions. 

DHS  has  tasked  the  National  Infrastructure 
Simulation  and  Analysis  Center  (NISAC) 
with  development  of  a chemical 
infrastructure  analytical  capability  to  assess 
interdependencies  and  complexities  of  the 
nation’s  critical  infrastructure,  including  the 
chemical  sector.  The  Federal  Government 
established  NISAC,  which  includes 
personnel  at  Sandia  National  Laboratories 
(Sandia)  and  Los  Alamos  National 
Laboratory  to  support  efforts  aimed  at 
identification  of  dependencies  within  and 


across  sectors,  providing  consequence 
assessment  to  enable  National  Risk 
Analysis. 

To  address  this  need,  DHS’s  Science  and 
Technology  Directorate  has  funded  the 
Sandia  component  of  NISAC  in  an  ongoing 
effort  to  integrate  its  existing  simulation  and 
infrastructure  analysis  capabilities  with 
various  chemical  industry  datasets.  The 
intent  of  this  effort  is  to  develop  and 
ultimately  provide  capabilities  in 
consequence  and  resilience  analysis  of 
natural  and  manmade  events  that  impact 
the  chemical  industry  and  chemical- 
dependent  sectors  of  the  economy. 

This  document  describes  key  elements  of 
this  ongoing  development  effort,  including 
the  modeling  and  simulation  tools  utilized  in 
analyzing  the  chemicals  sector  from 
different  perspectives.  This  includes  a case 
study,  examining  the  effects  of  a Gulf  Coast 
hurricane  on  segments  of  the  chemicals 
sector  and  an  examination  of  consequence 
and  resilience  metrics. 

2.0  BODY 

Consequence  and  resilience  analysis  of  the 
chemicals  sector  requires  a wide  range  of 
modeling  techniques  to  answer  questions  of 
varying  scopes,  acting  on  a common  data 
set.  To  do  this,  Sandia  developed  and 
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populated  a common  data  model,  a set  of 
modeling  capabilities  with  different 
resolutions,  and  a framework  for  analyzing 
resilience. 

2.1  Chemical  Data  Model  (CDM) 

Central  to  the  development  effort  aimed  at 
providing  consequence  and  resilience 
analysis  is  a common  Chemical  Data  Model 
(CDM).  The  CDM  draws  on  infrastructure, 
population,  labor,  economic  and  other  data 
sets  from  a variety  of  commercial  (e.g.,  SRI 
Consulting,  Penn  Wei  I,  Minnesota  I M PLAN 
Group)  and  government  (e.g.,  U.S.  Bureau 
of  the  Census,  National  Geospatial- 
Intelligence  Agency,  and  Surface 
Transportation  Board)  data  sources,  as  well 
as  on  data  developed  during  the  project. 
CDM  data  are  updated  annually,  at 
minimum,  or  as  often  as  updates  become 
available. 

Figure  1 represents  a simplified  schematic 
of  how  the  information  within  CDM  is 
organized,  merged,  and  stored.  Each 
chemical  plant  in  the  database  has 
attributes  that  identify  where  it  is  located, 
what  chemicals  are  produced  and  stored, 
the  associated  capacities  for  production  and 
storage,  and  what  production  technologies 
are  used  at  the  plant. 


sector  (and  other  sectors  as  appropriate  to 
the  question). 

2.2.1  FASTMap 

Geospatial  analysis  is  conducted  using  a 
tool  called  FASTMap.  FASTMap  is  a 
geographic  information  system  (GlS)-based 
tool  that  creates  common  look-and-feel, 
production-quality  maps  of  the  chemicals 
sector  and  other  CIKR  sectors  relative  to 
disruption  areas,  and  provides  data  on 
CIKR  assets  (e.g.,  names,  number  of 
facilities)  in  the  disruption  area. 

2.2.2  Fast  Analysis  Infrastructure 
Tool  (FAIT) 

Infrastructure  dependency  analysis  is 
conducted  within  a tool  called  the  Fast 
Analysis  Infrastructure  Tool  (FAIT).  FAIT 
provides  data  on  the  dependencies  of 
specific  chemical  sector  components  {e.g., 
plants  and  pipelines)  on  assets  in  other 
infrastructure  (e.g.,  electric  power, 
transportation,  emergency  services). 

2.2.3  Loki 

Network  analysis  is  conducted  using  a tool 
called  Loki,  which  is  a network  model  and 
analysis  tool  designed  to  quickly  estimate 
potential  production  losses  among  chemical 
manufacturing  processes. 


Figure  1.  The  Chemical  Data  Model. 

2.2  Consequence  Analysis  Tools 

A variety  of  Sandia-developed  tools 
leverage  this  common  data  structure  for 
various  aspects  of  analysis  of  the  chemicals 


2.2.4  Railroad-Network  Analysis 
System  fR-NASV 

Rail  transportation  analysis  is  accomplished 
through  a network  tool  called  the  Railroad- 
Network  Analysis  System  (R-NAS).  R-NAS 
models  the  U.S.  national  rail  network  and 
estimates  the  impact  to  national  rail 
commodity  flows  given  disruptions  to  the  rail 
system  (bridges,  rail  yards,  and  so  forth). 

2.2.5  NISAC  Agent-Based 
Laboratory  for  Economics  (N- 
ABLE^"^) 

Dynamic  supply  chain  analysis  is  conducted 
using  the  NISAC  Agent-Based  Laboratory 
for  Economics™  (N-ABLE™),  a large-scale 
microeconomic  supply  chain  model  and  tool 
that  allows  for  the  analysis  of  the  impacts  to 
individual  firms  (production,  sales, 
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transportation,  and  inventories)  and  the 
broader  supply  chain  over  time  (output, 
shipments,  and  inventories)  resulting  from 
disruptions  to  firms  and  transportation 
networks  [2],  [3],  N-ABLE™  draws  on  the 
results  of  the  other  analysis  tools  and 
subject  matter  expertise  to  define  disruption 
parameters  and  simulate  individual  firm 
behaviors  within  the  modeled  supply  chains. 
Figure  2 shows  a representation  of  the 
interaction  of  a typical  N-ABLE™  enterprise 
firm,  containing  different  types  of  decision 
makers  with  objectives,  interaction  with 
each  other,  with  supporting  CIKR,  and  with 
upstream  and  downstream  ‘markets’  for 
input  commodities  and  output  products. 
Entire  supply  chains  are  constructed  from 
collections  of  firms,  based  on  this  enterprise 
design,  with  each  participating  firm 
interacting  with  others  through  markets  and 
physical  infrastructure. 


Figure  2.  N-ABLE™  Enterprise  Model  of  an 
Economic  Firm. 

N-ABLE™  simulation  results  provide 
quantitative  and  qualitative  information 
necessary  for  consequence  analyses.  For 
example,  if  a hurricane  temporarily  shuts 
down  a set  of  chemical  production  facilities, 
N-ABLE™  estimates  economic  impacts 
resulting  from  a decreased  chemical  supply 
to  downstream  facilities  (e.g.,  customers  of 
the  closed  facilities,  the  customers  of  the 
customers,  etc.).  N-ABLE™  also  estimates 
losses  resulting  from  a decreased  demand 
of  input  chemicals  used  by  the  closed 
production  plants  to  upstream  facilities  {e.g.. 


suppliers  to  the  closed  plants,  suppliers  of 
the  suppliers,  etc.).  These  economic  impact 
and  loss  estimates  can  be  used  to  measure 
the  systemic  impacts  to  the  chemical  supply 
chain  from  a hurricane. 

In  addition,  N-ABLE™  estimates  the  time 
necessary  for  the  system  to  recover  from  a 
disruption.  In  the  case  of  chemical  supply 
chains,  supply  interruptions  can  cascade 
through  many  other  sectors  at  different 
rates.  Some  downstream  consumers  will 
feel  the  impact  of  interrupted  production 
immediately,  some  will  not  feel  the  Impact 
until  days  or  weeks  later,  and  some  will  not 
feel  It  at  all.  Inherent  in  N-ABLE™  is  the 
capability  to  represent  the  search  for  other 
supply  sources  when  losses  occur  and  any 
changes  in  transportation  costs  associated 
with  the  need  to  use  alternate  suppliers. 

The  cost  estimates  associated  with  the 
recovery  and  adaptation  processes  are 
crucial  to  estimating  supply  chain  recovery 
processes. 

2.3  Resilience  Analysis  Framework 

A uniform,  methodical  approach  for 
assessing  resilience  of  infrastructure 
systems  is  required  to  successfully 
incorporate  resilience  into  critical 
infrastructure  protection  (CIP)  policies  and 
business  planning  practices.  This  approach 
needs  to  be  general  enough  to  apply  to  all 
types  of  infrastructure  systems  to  account 
for  dependencies  between  different 
infrastructure  types  and  establish  standards 
across  all  infrastructure  types.  Furthermore, 
resilience  assessment  approaches  should 
explicitly  account  for  the  costs  of  recovery 
processes  in  comprehensive  disruption  cost 
evaluations. 

With  these  two  requirements  in  mind, 

Sandia  has  developed  a novel  framework 
for  evaluating  the  resilience  of  infrastructure 
and  economic  systems  [4].  The  framework 
includes  a new  definition  of  resilience,  a 
mathematical  resilience  cost  measurement 
approach,  and  a qualitative  analysis 
methodology  that  assesses  system 
characteristics  that  affect  resilience.  This 
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framework  can  be  applied  to  studies  of 
natural  and  manmade  disruptions. 


The  framework  as  developed  presents  a 
mathematical  resilience  cost  measurement 
approach  that  can  be  used  to  objectively 
determine  the  impacts  of  disruptions  on  a 
system  and  the  resilience  costs  associated 
with  disruptions.  The  resilience  cost 
measurement  approach  requires 
quantification  of  two  key  components  of  the 
definition  of  system  resilience:  systemic 
impact  {SI)  and  total  recovery  effort  {TRE). 
SI  is  the  impact  that  a disruption  has  on 
system  productivity  and  is  measured  by 
evaluating  the  difference  between  a 
targeted  system  performance  (TSP)  level 
and  the  actual  system  performance  (SP) 
following  the  disruption.  TRE  refers  to  the 
efficiency  with  which  the  system  recovers 
from  a disruption  and  is  measured  by 
analyzing  the  amount  of  resources 
expended  during  the  recovery  process.  The 
measurement  of  system  resilience  costs 
requires  the  quantification  of  both  SI  and 
TRE. 

Figure  3 graphically  represents  systemic 
impact  for  a hypothetical  system  that  has 
been  disrupted.  In  this  example,  system 
performance  decreases  immediately 
following  the  disruption  shock.  With  the 
onset  of  recovery  actions,  performance 
levels  eventually  increase  and  ultimately 
attain  targeted  system  performance  levels. 
At  this  point,  recovery  is  considered 
complete.  SI  is  quantified  by  calculating  the 
area  between  the  TSP  and  the  actual  SP 
curves  in  Fig.  3.  This  area  is  calculated 
using  the  formula  in  Eq.  (1). 


Figure  3.  Systemic  Impact  (SI). 


SI  = 0TSP{t)  - SP{t)]  dt  Eq.  (1). 

Figure  4 illustrates  the  recovery  response 
for  the  system  shown  in  Fig.  3.  After  the 
disruption  initiates,  the  recovery  response 
begins  and  resources  are  expended  in  this 
effort.  The  TRE  is  the  cumulative  amount  of 
resources  expended  during  the  recovery 
period  and  is  represented  by  the  area  under 
the  recovery  effort  {RE)  curve  in  Fig.  4.  This 
area  is  calculated  by  Eq.  (2). 


Figure  4.  Total  Recovery  Effort  (TRE). 

TRE  = Eq.  (2). 


System  performance  is  determined  by  the 
RE.  That  is,  different  F?Es  lead  to  different 
system  performances.  For  example,  if  no 
RE  is  made  following  the  disruption,  the  loss 
of  system  performance  may  be  great.  In 
contrast,  if  recovery  resources  are  deployed 
shortly  after  the  system  shock,  system 
performance  may  not  be  significantly 
affected,  and  S/  may  be  small.  The 
recognition  that  S/  is  implicitly  determined 
by  the  selected  recovery  strategy  leads  to 
the  development  of  recovery-dependent 
resilience  {RDR)  cost  measurements.  RDR 
costs  are  the  resilience  costs  of  a system 
under  a particular  recovery  strategy  and  are 
calculated  with  Eq.  (3). 


RDR{RE)  - 

\TSP(_t)\dt 


Eq.  (3). 


RDR  costs  are  linear  combinations  of  SI 
and  TRE.  The  denominators  in  Eq.  (3)  are 
normalization  factors  that  permit  the 
comparison  of  the  resilience  of  systems  of 
different  magnitudes.  Because  resilience 
represents  a balancing  of  SI  and  TRE  costs. 
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the  calculation  of  RDR  costs  includes  the 
parameter  a.  which  is  a weighting  factor  that 
allows  an  analyst  to  assign  the  relative 
importance  of  the  systemic  impact  and  total 
recovery  effort  terms.  Assigning  a small 
positive  value  to  a weighs  the  systemic 
impact  more  heavily;  a large  positive  value 
for  a weighs  the  cost  of  recovery  more 
heavily.  To  equally  weigh  SI  and  TRE,  a is 
set  to  1 . 

In  addition  to  RDR  costs,  optimal  resilience 
(OR)  costs  can  also  be  considered,  but  their 
calculation  is  beyond  the  scope  of  this  work 
at  present. 

When  applied  to  the  CDM,  N-ABLE™ 
simulations  can  provide  quantitative  and 
qualitative  information  necessary  for 
resilience  analyses.  For  example,  if  a 
hurricane  temporarily  shuts  down  a set  of 
chemical  production  facilities,  N-ABLE™ 
can  estimate  economic  impacts  resulting 
from  a decreased  chemical  supply  to 
downstream  facilities  (e.g.,  customers  of  the 
closed  facilities,  the  customers  of  the 
customers,  etc.).  N-ABLE™  can  also  predict 
losses  resulting  from  decreased  demand  of 
input  chemicals  used  by  the  closed 
production  plants  to  upstream  facilities  (e.g., 
suppliers  to  the  closed  plants,  suppliers  of 
the  suppliers,  etc.).  These  economic  impact 
and  loss  estimates  can  be  used  to  measure 
the  S/s  to  the  chemical  supply  chain  from  a 
hurricane. 

In  addition,  N-ABLE™  can  predict  how  the 
chemical  sector  will  adapt  to  and  recover 
from  a disruption.  The  tool  has  the  capability 
to  estimate  production  curtailments  by  the 
customers  of  the  closed  plants  that  cannot 
find  new  suppliers,  the  higher  transportation 
costs  associated  with  new  suppliers,  the 
use  of  chemical  substitutes,  and  the 
implementation  of  different  production 
technologies  and  recipes  to  adapt  to  a 
disruption.  The  cost  estimates  associated 
with  the  recovery  and  adaptation  processes 
are  crucial  to  calculating  the  TRE  in  a 
resilience  analysis. 


3.0  DISCUSSION 

3.1  Analysis  Basis 

The  methodology,  models,  data,  and  other 
capabilities  described  above  have  been 
applied  to  a variety  of  homeland  security 
problems.  The  following  summary  of  an 
analysis  of  a Category  3 hurricane  making 
landfall  in  the  Gulf  Coast  is  an  example  of 
this  application. 

The  scenario  hurricane  was  patterned  after 
the  actual  Hurricane  Ike  (2008),  which 
developed  during  the  early  part  of 
September  and  made  its  landfall  over 
coastal  Texas  on  September  13,  2008,  The 
storm  moved  at  a projected  forward  speed 
of  12  miles  per  hour  (mph),  carrying 
maximum  sustained  winds  of  over  100  mph. 
The  storm  size  was  approximately  230 
miles,  and  it  was  the  third  most  destructive 
hurricane  in  U.S.  history.  Figure  5 shows 
the  projected  path  of  Hurricane  Ike  early  on 
September  11,  2008. 


Figure  5.  Projected  Path  of  Hurricane  Ike, 
NOAA  Advisory  41, 0400  CDT  September  1 1 , 
2008. 


Sandia  used  the  scenario  storm  parameters 
(trajectory  and  category)  in  this  analysis  to 
first  estimate  the  damages  from  wind  and 
surge  waters.  These  damage  estimates  are 
translated  into  areas  of  probable  electric 
power  outage  and  inland  flooding  depths. 
Sandia  analysts  then  assessed  potential 
direct  impacts  to  chemical  facilities. 


513 


petroleum  refineries,  and  the  natural  gas 
network  for  elements  physically  affected  by 
this  scenario  storm.  Analysts  then  assessed 
the  indirect  impacts  to  facilities  and 
infrastructures  not  in  the  path  of  the 
hurricane  but  dependent  on  facilities  within 
the  disruption  area.  Finally,  analysts 
estimated  cascading  impacts  to  the 
chemical  industry  and  petrochemical  supply 
chain  at  a regional,  national,  and  global 
level.  Figure  6 shows  the  estimated  electric 
power  disruption  area  for  the  scenario 
hurricane.  Differences  in  color  reflect  the 
likelihood  of  power  outage  (green  reflecting 
a 0-  to  25-percent  probability  of  outage,  red 
representing  a 75-  to  100-percent 
probability  of  power  outage),  while  intensity 
of  color  reflects  the  projected  duration  of 
disruption  (lighter  shades  representing 
shorter  duration  of  outage  where  present, 
darker  shades  representing  longer  duration 
of  outage  where  present). 


Figure  6.  Estimated  Disruption  Area  of  the 
Scenario  Hurricane. 


It  is  common  practice  for  Gulf  Coast 
petrochemical  production  facilities  in  the 
projected  path  of  a hurricane  to  shut  down 
operations  48  hours  prior  to  hurricane 
landfall.  On  average,  the  petrochemical 
facilities  within  the  electric  power  outage 


contours  will  be  without  power  for  a few 
weeks.  Production  at  these  facilities  will  not 
likely  be  restored  immediately  following 
restoration  of  power.  Following  a plant 
shutdown,  petrochemical  facilities  often 
require  additional  startup  time  to  perform 
system  checks,  such  as  purging  pipelines 
and  vessels  with  inert  gases  such  as 
nitrogen,  to  ensure  the  unit’s  operability.  To 
simulate  the  cumulative  effects  of  these, 
analysts  assumed  that  all  petrochemical 
facilities  within  the  outage  contours  are  shut 
down  for  25  days. 

To  quantitatively  evaluate  the  resilience  of 
the  petrochemical  supply  chain,  we  ran  two 
sets  of  N-ABLE™  simulations.  In  the 
baseline  scenario,  we  assumed  no 
disruptions.  In  the  disruption  scenario,  we 
assumed  that  a hurricane  is  projected  to 
make  landfall  on  day  202  of  the  simulation 
and  the  electric  power  outage  shown  in 
Figure  6 is  expected  to  occur.  On  day  200, 
all  petrochemical  facilities  within  the 
contours  shut  down  in  anticipation  of  the 
storm.  Normal  production  capabilities  are 
assumed  to  return  on  day  225  of  the 
simulation. 

The  market  value  of  production  (MVP)  is  the 
metric  used  to  measure  SI.  MVP  captures 
total  “street  value”  of  every  step  of 
chemical-unit  production.  It  is  similar  to  the 
sale  value  of  end  products,  but  it  counts 
production  at  every  stage  in  the  production 
process,  whereas  the  sale  value  only  counts 
chemicals  that  are  sold  on  the  merchant 
market.  MVP  equals  sale  value  of  end 
products  if  there  is  absolutely  no  vertical 
integration,  i.e.,  outputs  of  every  stage  of 
the  production  process  are  sold  on  the 
merchant  market 

For  this  analysis,  two  factors  are  considered 
in  determining  TRE:  additional  aggregate 
transportation  costs  (TC)  and  production 
plant  shutdown/restart  costs  {RC).  When  a 
disruption  decreases  the  supply  of  available 
chemicals,  consumers  of  those  chemicals 
will  seek  new  suppliers.  These  suppliers  will 
likely  be  farther  from  the  consumers  than 
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the  original  suppliers,  so  the  cost  of 
transporting  chemicals  from  the  new 
suppliers  will  likely  be  greater  due  to  the 
increased  transportation  distances 

Cost  engineering  estimates  RCs  as  a 
percentage  of  the  capital  costs  of  the 
equipment  involved.  Pre-planned,  short- 
term shutdowns  are  generally  less 
expensive,  based  on  available  data.  After 
literature  review,  consultation  with  project 
subject  matter  experts  and  economists  at 
the  National  Center  for  Risk  and  Economic 
Analysis  of  Terrorism  Events  (CREATE) 
and  the  American  Chemistry  Council  (ACC), 
the  authors  utilized  an  RC  of  3 percent  of 
capital  costs. 

For  the  sake  of  simplicity,  we  only  consider 
the  TCs  and  RCs  when  calculating  the  TRE 
for  this  example.  To  calculate  RDR  costs, 
we  set  a to  1 in  Eq.  (3)  and  approximate  the 
integral  with  1-day  time-step  intervals 
because  N-ABLE™  reports  data  on  a daily 
basis. 

Figure  7 shows  MVPas  a function  of  time 
for  the  base  case  and  the  scenario 
hurricane  for  the  whole  Ethylene  supply 
chain.  Utilization  of  inventories  (in  hand  and 
in  transit)  helps  to  buffer  some  of  the  effects 
of  the  disruption. 
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Figure  7.  MVPas  a function  of  time,  Ethylene 
Supply  Chain,  Base  and  Hurricane 
Scenarios. 


Figure  8 shows  average  shipment  distance 
as  a function  of  time  for  the  base  case  and 
the  scenario  hurricane  for  the  whole 
Ethylene  supply  chain.  The  inventory 
utilization  described  in  Figure  7 comes  at  a 
cost,  which  reflects  through  in  the 
calculation  of  TC,  and  as  a result,  on  TRE. 


19D  240  2^  HAO 

Time(Oays) 

Figure  8.  Average  Shipment  Distance  as  a 
Function  of  Time,  Ethylene  Supply  Chain, 
Base  and  Hurricane  Scenarios. 

Table  1 shows  the  comparative  analysis  of 
the  calculation  of  System  Resilience  for  the 
whole  Ethylene  supply  chain  and  for  a 
segment,  Vinyl  acetate  monomer  (VAM). 
Impacts  to  VAM  production  are  more  severe 
than  the  aggregate,  transportation  distances 
and  costs  greater,  and  recovery  period 
longer.  As  such,  the  VAM  resilience  metric 
is  larger  by  one-third  than  that  of  the 
Ethylene  supply  chain  as  a whole  (here,  a 
lower  value  reflects  a more  resilient 
system). 

Table  1.  Comparison  of  Resilience  Values, 
Hurricane  Scenario,  VAM  and  Ethylene 
Supply  Chains 


Measure 

VAM 

Ethylene 

Target  MVP  ($M) 

856 

49,000 

SI  ($M) 

88 

4.000 

TRE:  RC  ($M) 

11 

256 

TRE:  TC  ($M) 

1.5 

254 

Resilience 

.12 

.09 

Resilience  = (SI  + (RC  + TRE)] 

I/Target  MVP 

A more  detailed  discussion  of  the 
consequence  and  resilience  analysis  results 
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for  this  scenario  will  be  presented  at 
MODSIM  World  2010. 

4.0  CONCLUSION 

Analysis  of  chemical  supply  chains  is  an 
inherently  complex  task,  given  the 
dependence  of  these  supply  chains  on 
multiple  infrastructure  systems  {e.g. 
transportation  and  energy).  The  capability 
developed  at  Sandia  is  intended  to  provide 
information  to  the  DHS  with  respect  to  the 
consequences  of  large-scale  disruptions  to 
the  chemical  sector,  including  interrupted 
supply  and  resulting  economic  Impacts  to 
the  nation,  which  can  be  utilized  to  inform 
response  and  recovery  officials,  enabling 
more  effective  pre-event  planning  and  more 
knowledgeable  event  response.  The 
ongoing  development  effort  Includes  the 
development  of  several  tools  along  with  a 
comprehensive  database  that  feeds  the 
tools.  The  database  is  constructed  by 
merging  many  datasets  in  combination  to 
provide  a high  degree  of  resolution  within 
the  data  so  that  individual  plants  can  be 
uniquely  represented. 

The  hurricane  disruption  scenario  presented 
herein  shows  that  large-scale  disruptions  to 
petrochemical  supply  chain  elements  affect 
many  supply  chains  and,  consequently,  take 
considerable  time  to  recover  (Figures  7 and 
8).  Supporting  this  result  in  the  scenario 
analysis,  information  reported  in  Chemical 
Week  showed  that 

Several  Texas  Gulf  Coast  chemical 
plants  began  to  restart  operations  after 
shutting  down  ahead  of  Hurricane  Ike's 
landfall  on  September  13,  2008. 

However,  producers  claim  that  the 
ready  availability  of  utilities,  raw 
materials,  and  logistics,  and  the 
damage  at  some  customer  sites 
negatively  affect  their  effort  to  restart 
operations  [5]. 

The  disruption  to  chemical  plants  cascade 
both  up  and  down  the  supply  chain, 
affecting  recovery  efforts. 
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Outline 


• Introduction 

• Technical  Elements 

- Chemical  data  model  (CDM) 

- Consequence  Analysis  Tools 

- Resilience  Analysis  Framework 

• Measuring  Resilience 

• Summary 

• Questions 


CanfEJBiTCQ  i.  £)cpo  Introduction 

• Chemical  sector  is  highly 
connected  to  multiple 
infrastructures  and 
commercial  sectors 

• Consequence  analysis 
capability  must  consider 

• Disruptions  of  the  chemical 
sector 

• Disruptions  of  interdependent 
infrastructures 

• National,  regional,  and  facility 
perspectives 
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Introduction 


Supply  chains  face  an  array  of  threats 

* "Protection,  in  isolation,  is  a brittle  strategy" 

• Effective  integration  of  resilience  into  critical 
infrastructure  protection  policies  requires 


Fktoifmg 


- Consistent,  broadly  applicable  definitions 
and  methods 

- Objective  methods  for  measuring 
progress 

- Comprehensive  accounting  of  resource- 
constrained  recovery  strategies 


Hiirric&ne 


Cmnwdity  FSow  Impacts 


"We  are  working  every  day  to  ensure  our  country  stands  ready  to  respond  to  any  disaster  or 
emergency  - from  wildfires  and  hurricanes,  to  terrorist  attacks  and  pandenrttc  disease.  Our 
goal  is  to  ensure  a more  reBilient  Nation.” 

“President  Barack  Obama,  September  4,  2009 


Introduction 

• We  need  to  address  direct  impact  questions 

• What  is  the  area  of  direct  impact? 

• What  chemical  facilities  are  directly  affected? 

• What  percentage  of  capacity  does  this  represent? 

• And  cascading  impact  questions 

• How  long  before  we  return  to  ‘normal’? 

• What  additional  facilities  will  be  affected? 

• We  also  need  to  be  able  to  examine  systemic 
resilience 

• Define 

• Calculate 
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>RLD 

KJ 

Introduction 

Organization 

Role 

DHS  Science  and  Technology  (S&T) 
Directorate,  Infrastructure  and 
Geophysical  Division 

Manage  the  chemical  supply 
chain  and  resilience  project 

Sandia  National  Laboratories, 
Interdependencies  and 
Consequence  Management  Group 

Develop  analysis  and  design 
capabilities 

National  Infrastructure  Simulation 
and  Analysis  Center  {NISAC) 
(managed  by  DHS  Office  of 
Infrastructure  Protection  [IP]) 

Apply  completed  capabilities  to 
disruptions  of  critical 
infrastructures  and  key 
resources  (CIKRs) 
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Technical  Elements 


• Chemical  Data  Model  (CDM) 
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Technical  Elements: 

CDM 

Data  foundation  for  the  project 

- All  models  are  driven  from  the  same  set  of  core  input 
data 

- Differences  in  model  output  are  due  to  modeling 
approach  S 


C»WC4>!!bt| 

/ 

r 

FmIMv 

QHK44M 

noducl  IM 
P«r  Ik>i% 
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Technical  Elements: 

CDM 


■ Project  models  and  analysis  tools  need  the  follo\wing 
information 


• Plant  facilities 

- Name 

- Location  {address  and  geocodes) 

• Facility  productions 

- Chemical  types 

- Quantities 

- Processes 

• Infrastructure 
dependencies 

- Transportation  (rail,  pipeline,  etc.) 

- Energy  (electric  power,  natural 
gas,  petroleum  products) 

- Quantities 


• Consumption 

- Categories 

- Locations 

- Quantities 

• Imports/exports 

- Locations 

- Quantities 

• Other  factors 

- Economics 

- Population 
distribution 

- Emergency  services 
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Technical  Elements: 

CDM 


Dataset  Name 

Provider 

World  Petrochemicals  Program  2009 

SRI  Consulting 

Chemical  Economics  Handbook  2009 

SRI  Consulting 

Directory  of  Chemical  Producers  2009 

SRI  Consulting 

Oil  & Gas  Pipelines 

NGA  HSIP  Gold  2003  (PennWell) 

Oil  & Gas  Facilities 

NGA  HSIP  Gold  2008  (PennWell) 

United  Slates  Census  2000 

U.S.  Census  Bureau 

County  Business  Patterns  2007 

U.S.  Census  Bureau 

County  Business  Patterns  Employees 
Estimation  2007 

U.S.  Census  Bureau 
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Technical  Elements: 

CDM 


Dataset  Name 

Provider 

Geographic  Names  Information  System 

U.S.  Geological  Survey 

IMPLAN  States  Summary  2002 

Minnesota  IMPLAN  Group 

International  Trade  Statistics  2007 

U.S.  Department  of  Commerce 

Refinery  Location  Data 

Argonne  National  Laboratory* 

2005  Commodity  Flow  Survey, 
Department  of  Transportation 

2005  Waybill  Sample,  Surface 
Transportation  Board 

2007  Class  1 Railroad  Statistics, 
Association  of  American  Railroads 

2007  Producer  Price  Index, 
Department  of  Labor 

E-Plan  Emergency  Response 
Information  System 

U.S.  Environmental  Protection 
Agency/U.S.  Department  of  Homeland 
Security 

*Argonne  data  were  updated  using  2007  domestic  data  from  the  Energy  Information 
Administration  (EIA)  and  foreign  data  from  SRI  Consulting. 
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Technical  Elements: 

CDM 


• CDM  Building  Process 

- Gather 

- Process  and  integrate 

• Merge  datasets  into  a 
common,  Oracle-based 
framework 

- Authenticate 

- Document 

- Ensure  traceability 

- Test 

• Ensure  compatibility  with 
models 

- Iterate 
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Technical  Elements: 
Consequence  Analysis  Tools 


N-ABLE™ 


FASTMap 


R-NAS 


FAIT 


Information  Sharing 


Network  Analysis 


Time  to  respond 


Dynamic  Supply  Chain  Analysis  with  the 
NISAC  Agent-Based  Laboratory  for 
Economics  (N-ABLE™) 


The  networks  of 
enterprises  comprise 
national,  regional,  and 
local  markets 


Individual 
enterprises  are 
combined  to 
create  networks 
of  enterprises 
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Model  Foundation  of  N-ABLE^'^: 
The  Enterprise  Firm 


• N-ABLE™ 

• Generates  data-driven 
microeconomic 
“enterprises" 

* Simulates  enterprise 
operations  (buyers, 
production,  sellers, 
inventories,  and  shipping) 

• Identifies  interactions  in 
markets  and  dependencies 
on  critical  infrastructures 

* Estimates  how  enterprises 
respond  individually  and 
collectively  to  disruptions 
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Resilience  Framework: 
A Definition  of  Resilience 


• Resilience  is  contextual 

• Relative  to  disruptive  event  and  performance  targets 

• System  performance  is  a fundamental  factor 

• Structure  not  as  important  as  performance 

• We  consider  magnitude  and  duration 

• We  do  not  assume  a system  will  return  to  pre-disruption  state 

• Resource  expenditure  in  recovery  processes  a 
fundamental  consideration 

- We  consider  the  ability  to  efficiently  reduce  system  impacts  to 
absorb,  adapt,  and/or  recover 
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Resilience  Framework: 
A Definition  of  Resilience 
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Resilience  Framework: 
Calculation  of  Resilience  Costs 


Shock 
occurs  at 

t = tO  Duration  “ tf  - to 


Recovery  ts 
complete  at 


time 


/ Total  Recovery  \ 

/ Effort  (TRE) 

RE(t) 

time 

Duration 

Recovery  effort 
CO  m m e n ce  s f ol  b w in  g 
shock 


Recovery  is 
complete  at 
t = tf 


Resilience 

Costs 


SI  + ax  TRE 
If 

\\TSP{t)\dt 

/O 
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Resilience  Framework: 
Qualitative  Resilience  Assessment 


Resilience 


Component 

Systemic  Impact 

A 

Total  Recovery  Effort 

1 

_u 

L_L_ 

1 

Determining 

Features 

Absoiptive 

Capacity 

Adaptive 

Capacity 

Restorative 

Capacity 

Distinguishing 
Characteristics 
of  Capacity 

Considers  aspects 
thatautomaticaliy 
manifestafterthe 
disruption 

Considers  intemai 
aspects  that 
manifest  over  time 
after  the  disruption 

Considers  ability  to 
affect 

andrepairintemal 
system  features 

Effort  Required 

Automatic/ 
Little  Effort 

Intemai  Effort 
Required 

Externai  Effort 
Often  Required 

Measurement 
of  Component 

Intemai  Measurement 

E xogen  ou  s Mea  su  rement 
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Resilience  Framework: 
Resilience  Analysis  Process 


■ To  apply  the  conceptual  framework  to 
the  chemical  sector,  we  must  define 
several  key  components  of  the  analysis 
process,  such  as 

• Chemicals  under  consideration 

• Performance  metric 

■ We  require  subject  matter  expertise  for 
this  process 

■ We  will  demonstrate  the  resilience 
analysis  process  for  a scenario 
hurricane 
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• Hurricane  makes  landfall  and 
affects  plants  in  electric  power 
(EP)  outage  contours 

• Facilities  within  the  outage 
contours  shut  down  2 days  prior 
to  landfall 

• These  facilities  are 
nonoperational  for  an  additional 
23  days 

• All  facilities  that  are  within 
outage  contours  require  startup 
processes 


Measuring  Resilience: 
Scenario  Assumptions 


Estimated  Power  Outages  & 
Restoration  Times 
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Measuring  Resilience: 
Defining  Metrics 


• Market  value  of  production  (MVP) 

- Total  “street  value”  of  every  step  of  production  for  all  chemicals 
and  facilities 


MVP{t)  = 


ij 


xp. 


/ 

Mass  produced 
of  chemical 


Chemical  price 
per  unit  mass 


“ Systemic  impact  metric  = MVP  for  disrupted  conditions 
- Targeted  system  performance  = MVP  for  undisrupted  conditions 
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Measuring  Resilience: 
Defining  Metrics  (continued) 


• Total  recovery  effort  metric  1 : 

- Additional  transportation  costs  (TC)  due  to  increased  transport 
distances 


Increased  average 
distance/shipment 


\ 

TC(t)  = 

y^MD  iyt 

^ i A 

X 

t -D^  t 1 

^ave  ave  J 

$3/car-mile 


t 


cost 


Met  demand  for  a 
chemical  (short  tons) 


(1  car /1 00  short  tons) 
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Measuring  Resilience: 
Defining  Metrics  (continued) 


• Total  recovery  effort  metric  2: 

• Production  plant  shutdown/restart  costs:  cost  engineering 
estimates  these  cost  as  a percent  of  capital  costs 
- Pre-planned,  short-term  shutdown  is  generally  less  expensive 
-After  consultation  with  project  chemical  subject  matter  expert,  literature, 
and  economists  at  National  Center  for  Risk  and  Economic  Analysis  of 
Terrorism  Events  (CREATE)  and  American  Chemistry  Council  (ACC), 
we  use  3 percent  of  capital  costs  to  estimate  shutdown/restart  costs 


ffC  = 0.03x^CC,. 


Capital  cost 
per  plant 


Restart  Costs  as  a Percent  of  Capital  Costs 


Source 

Range 

Median 

Periy  (2008) 

- 

m 

Peters  and  Timmerhaus  (196S) 

0.5-2% 

1,3% 

Peters  and  Timinertiaus  (1980) 

8-10% 

■ 

Price  (2000) 

5-20+% 

- 
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Recall  that 


Resilience  Costs 


Measuring  Resilience: 
Calculation  of  Resilience  Costs 


System  Impact  + ax  Total  Recovery  Effort 


Targeted  System  Performance 
• Therefore,  for  this  analysis: 


RC  = 


^^^Baseline  ^^^Disrupted  TC  + RSC 


MVP. 


Base/im 
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Measuring  Resilience: 
Obtaining  Data  through  N-ABLE™ 

Simulations 

• Two  sets  of  N-ABLE™  1-year  simulations  were  executed 

- Baseline  and  disrupted  conditions 

• In  the  disrupted  simulation 

* Plant  shutdown  is  assumed  to  occur  on  day  200,  and 

* Affected  plants  are  assumed  to  be  fully  operational  on  day  225 

• Simulations  provide  MVP  and  TC  data 

• Restart  costs  (RC)  are  estimated  external  to  the  simulation 


Measuring  Resilience: 

CanCEJEiTCQ  & fccpo  Systemic  Impact, 

Whole  Ethylene  Supply  Chain 


When  is 
recovery 
complete? 
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Measuring  Resilience: 
Adaptive  Behaviors,  Whole  Ethylene 

Supply  Chain 


Measuring  Resilience: 
Calculating  Resilience  Costs 


Measure 

VAM 

PVC 

Chain 

Entire 

Chain 

Recovery  “Complete”  on  Day 

264 

260 

250 

Target  WIVP  ($M) 

856 

14,800 

49,000 

Systemic  Impact  ($M) 

88 

1,100 

4,000 

Recovery  Effort:  Restart  ($M) 

11 

23 

256 

Recovery  Effort:  Transportation 
($M) 

1.5 

9.6 

254 

Resilience  Cost 

.12 

.08 

.09 

Resilience  Cost  = (Systemic  Impact  + Total  Recovery  Effort)/Target 

MVP 
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Measuring  Resilience: 
Calculating  Resilience  Costs 


• Systemic  impact  dominates  recovery  costs  in  all  systems 

• Restart  costs  far  outweigh  increased  transportation  costs 

• Restart  and  transportation  costs  for  VAM  are  relatively 
high 

• Though  the  VAM  system  is  the  smallest,  it  is  also  the 
least  resilient 

-Simplicity  may  hinder  resilience 


.iSo"  ^ “ S u m m a ry 

• This  project  takes  a multidisciplinary  approach  to 
chemical  supply  chain  modeling  and  resilience  analysis 

• We  have  integrated  our  consequence  analysis 
capabilities  into  a resilience  framework  to  enhance 
analytic  capabilities 

• We  plan  to  continue  capability  development  efforts  this 
year 

• We  welcome,  encourage,  and  value  feedback 
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Questions? 
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4.5  Integrated  Modeling,  Mapping,  and  Simulation  (IMMS)  Framework  for 
Exercise  and  Response  Planning 


Integrated  Modeling,  Mapping,  and  Simulation  (IMMS)  Framework 
for  Exercise  and  Response  Planning 

Jalal  Mapar 
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Abstract.  Emergency  management  personnei  at  federai,  state,  and  local  levels  can  benefit  from  the  increased  situational  awareness 
and  operational  efficiency  afforded  by  simulation  and  modeling  for  emergency  preparedness,  including  planning,  training  and 
exercises.  To  support  this  goal,  the  Department  of  Homeland  Security’s  Science  & Technology  Directorate  is  funding  the  Integrated 
Modeling,  Mapping,  and  Simulation  (IMMS)  program  to  create  an  integrating  framework  that  brings  together  diverse  models  for  use 
by  the  emergency  response  community.  SUMMIT,  one  piece  of  the  IMMS  program,  is  the  initial  software  framework  that  connects 
users  such  as  emergency  planners  and  exercise  developers  with  modeling  resources,  bridging  the  gap  in  expertise  and  technical 
skills  between  these  two  communities,  SUMMIT  was  recently  deployed  to  support  exercise  planning  for  National  Level  Exercise 
2010.  Threat,  casualty,  infrastructure,  and  medical  surge  models  were  combined  within  SUMMIT  to  estimate  health  care  resource 
requirements  for  the  exercise  ground  truth. 


1.0  INTRODUCTION 

1.1  Exercise  Management 

Exercises  provide  a way  to  assess  and 
validate  the  speed,  effectiveness  and 
efficiency  of  emergency  management 
capabilities,  and  test  the  adequacy  of 
policies,  plans,  procedures,  and  protocols  of 
emergency  response  in  a risk-free 
environment.  In  2002,  the  U.S.  Department 
of  Homeland  Security  (DHS)  and  Federal 
Emergency  Management  Agency  (FEMA) 
developed  policies,  incorporated  into  the 
Homeland  Security  Exercise  and  Evaluation 
Program  (HSEEP)  [1],  to  guide  the  design, 
development,  conduct,  and  evaluation  of 
exercises.  This  served  as  an  opportunity  to 
standardize  the  language  and  concepts 
used  in  the  exercise  planning  and 
evaluation  process  for  the  homeland 
security  community. 

Under  DHS  leadership,  the  National 
Exercise  Program  (NEP)  [2,3]  provides  a 


framework  for  prioritizing  and  coordinating 
federal,  regional  and  state  exercise 
activities,  without  replacing  any  individual 
department  or  agency  exercises.  NEP 
defines  four  tiers  of  exercises: 

Tier  I:  White  House  directed,  U.S. 

Government-wide  Strategy  & Policy  Focus, 
Full  Participation 

Tier  II:  Federal  Strategy  & Policy  Focus, 
Significant  Simulation 

Tier  III:  Other  Federal  Exercises, 

Operational,  Tactical  or  Organizational 
Focus,  Simulation 

Tier  IV:  State,  Territorial,  Local,  Tribal  or 
Private  Sector  Focus 

Typically,  Tier  I and  II  are  supported  by  the 
FEMA’s  National  Exercise  Division  (NED). 
Each  year  one  exercise  is  designated  as  the 
National  Level  Exercise  (NLE),  a Tier  I 
event  requiring  senior  level  participation 
among  the  Federal  interagency  community. 
NLEs  are  full-scale  exercises  that  are 
typically  scheduled  five  years  prior,  and  are 
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planned  for  up  to  two  years  in  advance.  A 
full-scale  exercise  is  a multi-agency,  multi- 
jurisdictional,  multi-discipline  exercise 
involving  functional  and  "boots  on  the 
ground"  response. 

The  role  of  modeling  and  simulation  (M&S) 
in  HSEEP  exercises  is  still  evolving.  Though 
dozens  of  federally-funded  modeling  efforts 
have  been  identified  that  are  relevant  for 
emergency  response  to  hazards  and  threats 
and  that  could  greatly  enhance  exercise 
planning  and  conduct,  there  is  no  formal 
mechanism  for  the  emergency  management 
community  to  discover,  access  and  use 
these  M&S  capabilities.  Furthermore,  the 
wide  range  and  disparity  of  M&S  capabilities 
for  emergency  management  necessitates  a 
way  to  integrate  and  run  models. 

1.2  IMMS  Program 

Realizing  the  opportunities  to  enhance 
exercises  and  planning  through  modeling 
and  simulation,  the  Department  of 
Homeland  Security  Science  and 
Technology  Directorate  (DHS/S&T) 
spearheaded  the  Integrated  Mapping, 
Modeling  and  Simulation  (IMMS)  program.^ 
IMMS  is  a research  and  development  effort 
to  develop  a common  framework  for 
integrating  incident-related  M&S  tools  to 
enhance  situational  awareness  and 
operational  efficiency  of  emergency 
managers  and  exercise  planners.  Among 
the  many  applications  are  support  to 
exercises,  including  Tiers  l-IV.  Recognizing 
that  the  modeling  and  simulation  community 
is  large  and  highly  diverse,  DHS/S&T  is 
providing,  through  IMMS,  the  M&S  and 
exercise  communities  a way  to  discover 
models,  integrate  them  quickly  and 
economically,  and  apply  them  in  analyses  to 
improve  exercise,  planning,  and  response 


^ "High-Priority  Technology  Needs",  Department  of 
Homeland  Security  Science  and  Technology 
Directorate,  Version  2.0,  June  2008.  The  Incident 
Management  Integrated  Product  Team  (IPT),  led  by 
FEMA  and  the  Office  of  Emergency  Communications, 
identified  an  integrated  modeling,  mapping  and 
simulation  capability  as  a high-priority  technology. 


efforts.  The  central  technological 
component  of  IMMS  is  the  Standard  Unified 
Modeling  and  Mapping  Integration  Toolkit 
(SUMMIT),  which  connects  users  such  as 
emergency  planners  and  exercise 
developers  with  modeling  resources  in  an 
easy-to-use  format.  This  paper  provides  a 
high-level  overview  of  the  SUMMIT 
architecture,  and  a description  of  IMMS 
support  to  National  Level  Exercise  2010 
(NLE10).  NLE10  lessons  learned  from  M&S 
support  of  exercises  are  being  incorporated 
as  requirements  and  concepts  of  operation 
(ConOps)  for  SUMMIT,  and  are  being  used 
to  develop  and  refine  the  IMMS  vision. 

2.0  BODY 

2.1  SUMMIT 

To  create  a capability  for  linking  together 
M&S  tools,  SUMMIT  is  being  iteratively 
designed  and  prototyped.  SUMMIT  provides 
a platform-neutral  framework  that  brings 
together  distributed  M&S  codes  and  a wide 
range  of  users.  The  framework  makes  it 
easier  to  discover  and  integrate  models, 
provision  them  for  a specific  scenario, 
execute  models  on  available  resources,  and 
deliver  results  to  a collaborating  set  of 
users. 

The  SUMMIT  architecture  allows  for 
considerable  flexibility,  placing  few 
restrictions  on  federated  models,  but  still 
providing  necessary  capabilities  for 
integration.  Model  owners  decide  who  has 
permission  to  use  their  code,  and  where  it  is 
hosted.  Exercise  planners  and  other 
emergency  response  users  link  models  as 
needed  to  address  specific  scenarios. 

For  example,  suppose  the  exercise  scenario 
is  the  release  of  chlorine  gas  from  a railcar 
in  an  urban  setting,  and  the  emergency 
responder  wants  to  know  if  there  are 
sufficient  medical  supplies  for  first  response. 
A computational  approach  (Fig.  1)  might 
incorporate  a finite  element  model  that 
computes  the  chemical  gas  dispersion 
plume  for  given  weather  conditions,  another 
model  that  quantifies  casualties  in  the 
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population  at  risk,  and  a third  model  that 
tallies  available  medical  resources. 

SUMMIT  makes  it  possible  for  the  planner 
to  link  these  three  models,  execute  them, 
and  vie\w  results,  all  from  a single  client 
interface. 

Figure  1 depicts  a SUMMIT  simulation 
template,  an  abstraction  that  shovi/s  how 
models  are  connected  to  address  a specific 
scenario.  Although  the  template  displays 
only  a high  level  view,  “under  the  hood” 
there  is  sufficient  detail  to  define  a software 
federation  of  models  that  can  execute 
automatically. 
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Figure  1.  Example  simulation  template 
linking  three  models  (boxes)  via  data  flows 
(arrows). 


Software  components  in  SUMMIT  are 
divided  between  a central  server,  user 
clients,  and  executable  models  that  are 
often  (but  not  always)  hosted  on  the 
machines  of  model  owners.  Figure  2 shows 
the  main  components  of  the  architecture. 
The  “data”  and  “model”  icons  in  the  figure 
are  components  owned  by  model 
contributors.  All  other  icons  are  part  of  the 
SUMMIT  framework.  A SUMMIT  Client 
component  allows  for  interaction  with  the 
user,  and  a SUMMIT  Software  Development 
Kit  (SDK)  provides  tools  for  model  owners 
to  integrate  their  models  and  verify  that  they 
are  SUMMIT-compliant.  Components  in  the 
SUMMIT  core  server  provide  system 
functionality  through  a set  of  distributed 
services. 


Figure  2.  SUMMIT  architecture. 

SUMMIT  provides  for  three  types  of  users: 
emergency  responders  and  exercise 
planners  (the  primary  end  user),  model 
owners,  and  scenario  planners.  End  users 
access  content  through  a SUMMIT  Client. 
They  log  into  the  system,  discover  an 
appropriate  simulation  template  (such  as 
Fig.  1),  configure  inputs,  and  then  view 
results  after  SUMMIT  automatically 
executes  the  models  that  compose  the 
simulation  template.  Model  owners  use 
SUMMIT  SDK  tools  to  create  a software 
wrapper  that  enables  execution  of  their 
model  as  part  of  a SUM  MIT- mediated 
federation  of  models.  (Note  that  in  this 
paper  the  term  “federation  of  models”  does 
not  refer  to  a High  Level  Architecture  [4] 
type  of  federation,  but  to  a collection  of 
models  run  consecutively  with 
interconnected  data.)  For  example,  the 
three  models  in  the  chlorine  gas  scenario 
described  earlier  might  be  contributed  by 
three  different  model  owners  and  hosted  at 
three  different  remote  sites.  Scenario 
p/annere  use  SUMMIT  SDK  tools  to  create 
simulation  templates  that  specify  inputs  and 
needed  outputs  and  bring  together  models 
for  a specific  incident  scenario.  In  the 
chlorine  gas  example  a scenario  planner 
created  the  simulation  template  by  linking 
three  models  at  a conceptual  level  to 
produce  the  desired  output. 
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Further  details  on  the  coordination  and 
execution  of  models  in  the  SUMMIT 


framework  are  discussed  in  references  [5] 
and  [6], 

SUMMIT  has  created  a flexible  external 
interface  that  allows  for  multiple  client 
environments.  These  include  a native  rich 
client  platform,  a browser-based  client,  and 
interfaces  to  advanced  commercial  and 
government  CIS  technologies.  Utilization  of 
these  environments  allows  users  to  grasp 
complex  multivariate  data  quickly  and 
intuitively.  The  IMMS  team  is  also 
investigating  the  capabilities  of  virtual 
environment  technology  to  bring 
telepresence  and  a heightened  sense  of 
situational  awareness  into  exercises. 

SUMMIT  is  also  providing  external 
interfaces  so  that  its  M&S  integration 
capabilities  can  be  leveraged  by  external 
tools.  For  example,  exercise  management 
tools  will  be  able  to  access  SUMMIT- 
archived  data,  allowing  for  even  greater 
intra-exercise  coordination. 

2.2  National  Level  Exercise  2010 

To  better  understand  the  potential  utilization 
and  role  of  IMMS  in  exercise,  the  IMMS 
team  participated  in  NLE10  in  May  2010  [7]. 
The  objective  of  this  first  pilot  was  to  apply 
the  existing  reference  implementation  of 
SUMMIT  in  a live  exercise  so  that 
architecture  requirements  and  concepts  of 
operation  (ConOps)  could  be  evaluated  and 
improved.  The  specific  scenario  for  NLE10 
was  response  and  recovery  from  incidents 
involving  an  improvised  nuclear  device 
(IND)  detonation  in  a U.S.  city.  The  scenario 
was  derived  from  National  Planning 
Scenario  1:  IND  Detonation  [8].  The  major 
objectives  of  NLE10  were  to  exercise: 

• Intelligence  and  information  sharing 
and  dissemination 

• Incident  Management 

• Critical  Infrastructure  protection 

• Medical  Surge 

• Public  information 

• Continuity  of  Operations  (COOP) 

• Economic  and  Community  Recovery 


NLE10  was  conducted  in  the  National 
Capital  Region.  Exercise  players  in  NLE10 
represented  over  60  federal  agencies, 
including  U.S.  Department  of  Defense, 
Central  Intelligence  Agency,  Department  of 
Energy,  Department  of  Health  and  Human 
Services,  DHS,  Department  of  Justice, 
Department  of  State,  Department  of 
Transportation,  and  Environmental 
Protection  Agency.  Due  to  a late  venue 
change,  this  NLE  consisted  of  federal  play 
only,  and  no  local  or  state  play;  however, 
representatives  from  local  and  state 
governments  and  FEMA  Regional  Office  V 
contributed  to  the  planning  process,  and 
participated  in  the  Simulation  Cell  (SimCell), 
providing  “boots  on  the  ground”,  realistic 
scenario  injects  that  drove  the  operations- 
based  exercise  play. 

Exercise  conduct  consisted  of  a Master 
Control  Cell  (MCC)  releasing  injects  from 
the  Master  Scenario  Event  List  (MSEL)  to 
exercise  players.  The  MCC  included: 

• A control  room  acting  as  the  key  node  of 
communication,  hosting  both  exercise 
controllers  (who  plan  and  manage 
exercise  play)  and  evaluators  (who  track 
action  relative  to  evaluation  criteria  and 
analyze  exercise  results  without 
disturbing  exercise  flow)  from  62  federal 
departments  and  agencies. 

• A SimCell  hosting  representatives  from 
the  region,  state,  local,  international, 
private  sector,  law  enforcement,  etc., 
which  provided  injects  to  and  answered 
requests  for  information  from  the 
exercise  players. 

MSEL  injects  were  released  via  phone, 
email,  fax  and  the  DHS  Lessons  Learned 
Information  Sharing  (LLIS)  portal  [9], 

2.3  How  SUMMIT  Supported  NLE10 

SUMMIT  was  one  of  several  M&S  providers 
for  NLE10,  supporting  both  exercise 
planning  and  execution.  The  IMMS  team 
used  SUMMIT  to  integrate  multiple  M&S 
tools  contributed  from  different  agencies. 
Threat,  casualty,  and  infrastructure  models 
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and  data  were  provided  by  the  DOE 
National  Atmospheric  Release  Advisory 
Center  (NARAC),  and  DHS  Homeland 
Infrastructure  Threat  and  Risk  Analysis 
Center  (HITRAC)  National  Infrastructure 
Simulation  and  Analysis  Center  (NISAC).  A 
medical  surge  model  from  the  Department 
of  Health  and  Human  Services  (HHS) 
Agency  for  Healthcare  Research  and 
Quality  (AHRQ)  provided  health  care 
resource  surge  requirements.  SUMMIT  was 
used  for  the  pre-planned  exercise  ground 
truth,  calculated  prior  to  the  exercise,  and 
the  real-time  scenario  injects,  computed 
during  the  exercise  execution. 

A new  SUMMIT  simulation  template  was 
created  specifically  for  the  NLE10  scenario 
(Fig,  3).  The  template  integrated  M&S  tools 
for  nuclear  effects,  infrastructure  effects  and 
medical  surge  needs.  The  NLE10  template 
defines  how  models  connect  and  the  data 
flows  between  models.  In  the  future,  this 
template  may  be  reused  to  support  similar 
exercises  at  the  federal,  state  or  local  level. 


Figure  3.  A representation  of  the  NLE10 
template  depicting  input  parameters,  models, 
and  the  data  flow  between  models.  Outputs 
are  colored  to  match  the  model  from  which 
they  are  produced. 

For  exercise  planning,  SUMMIT  was  used 
to  provide  ground  truth  injects  on  the 
amount  of  surge  equipment  and  staff  that 
would  be  required  in  the  medical  response 
(Fig.  4).  For  exercise  conduct.  SUMMIT 
was  used  to  provide  real-time  scenario 
injects  on  the  equipment  and  staff  needs 
estimated  by  on-scene  responders. 


Figure  4.  Example  output  data  from  models 
and  data  federated  in  SUMMIT  for  NLE10. 

3.0  DISCUSSION 

The  application  and  deployment  of  SUMMIT 
in  NLE10  provided  valuable  lessons  learned 
on  M&S  support  to  exercises.  These 
lessons  learned  are  being  incorporated  into 
SUMMIT  requirements  and  ConOps,  and 
will  enhance  SUMMIT’S  support  of  NLE11, 
an  earthquake  scenario  in  the  New  Madrid 
Seismic  Zone  based  on  National  Planning 
Scenario  9:  Natural  Disaster  - Major 
Earthquake  [8]. 

NLE10  lessons  learned  include: 

SUMMIT  can  facilitate  the  use  of  M&S  in 
exercise  planning. 

By  having  an  integrated  framework  for  M&S, 
exercise  planners  were  able  to  more  easily 
run  various  scenarios  in  order  to  generate 
the  ground  truth  data.  Exercise  planners 
did  not  have  to  expend  time  to  locate  the 
individual  models,  execute  a series  of 
distributed  M&S  tools,  and  gather  outputs, 
SUMMIT  enabled  multiple  executions  to  be 
made  easily  so  that  exercise  controllers 
could  carefully  plan  and  scope  their 
exercise.  Furthermore,  the  template  may  be 
reused  by  exercise  planners  who  are  using 
this  same  National  Planning  Scenario. 

Exercise  controllers  require  a common  and 
consistent  picture  of  the  exercise. 
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Maintaining  situational  awareness  of 
exercise  events  and  a common  exercise 
picture  across  the  dozens  of  exercise 
controllers  in  the  Master  Control  Cell  and 
SimCell  is  vital  for  sustaining  a realistic 
scenario.  During  NLE10,  there  were  several 
instances  in  which  a common  exercise 
picture  and  improved  situational  awareness 
would  have  been  valuable.  For  example,  at 
one  point  the  Master  Control  Cell  released  a 
scenario  inject  that  provided  incorrect 
ground  truth  data. 

SUMMIT  provided  some  enhanced 
situational  awareness  in  NLE10  by 
integrating  several  M&S  tools  into  a single 
conceptual  simulation  template.  A more 
comprehensive  common  exercise  picture 
can  be  provided  through  exercise 
management  tools  that  link  with  SUMMIT. 
Exercise  management  tools  provide 
timelines  of  exercise  injects,  expected 
player  actions,  and  actual  player  actions. 
They  show  how  player  actions  affect  the 
ground  truth  scenario.  The  exercise 
management  tools  used  in  NLE10  provided 
a global  view;  however,  according  to 
feedback  received  from  the  exercise 
controllers,  these  tools  were  neither  easy 
nor  intuitive  to  navigate  and  query. 
Additionally,  controllers  reported  that  there 
was  little  to  no  feedback  on  player  receipt  of 
injects  and  player  responses  to  injects.  The 
SUMMIT  architecture  is  being  designed  so 
exercise  management  tools  can  be 
integrated  with  M&S  tools  that  generate 
ground  truth.  This  will  help  ensure  a 
common  exercise  picture,  enabling  exercise 
planners  and  managers  to  record  and 
access  exercise  objectives,  scenario  ground 
truth  data,  expected  and  actual  player 
actions,  exercise  management  team 
actions,  consequences  of  player  actions, 
and  scenario  outcomes  in  one  location. 

Collaboration  environments  and 
visualization  tools  greatly  enhance  exercise 
planning  and  conduct,  but  must  be  user- 
friendly,  intuitive  and  robust. 


Visualization  tools,  such  as  GlS-based  and 
virtual  world  technologies,  can  be  used  to 
display  a common  exercise  picture  of  the 
exercise  scenario  data,  release  of  injects, 
player  actions  and  consequences  of  player 
actions.  For  exercise  planners  and 
controllers  who  are  not  techno-savvy  (which 
several  people  stated  about  themselves 
during  the  hot  wash  feedback  session 
immediately  following  NLE10),  a virtual 
world  or  other  visualization  should  make  it 
much  easier  for  them  to  interact  and  make 
changes  to  the  scenario  and  common 
exercise  picture.  The  SUMMIT  architecture 
is  providing  a means  for  virtual  world 
technologies  to  be  federated  with  M&S  and 
exercise  management  tools.  Exercise  data 
can  be  displayed  in  an  immersive 
environment  and  accessed  by  distributed 
exercise  planners,  managers  and  players. 

Scenario  data  coordination  and  consistency 
is  imperative. 

During  the  exercise  planning  phase,  M&S 
was  used  to  develop  ground  truth  data  and 
exercise  injects.  One  of  the  benefits  of  using 
M&S  for  this  purpose  is  to  help  ensure  that 
the  underlying  scenario  is  consistent  and 
realistic.  It  is  much  less  likely  for  exercise 
planners  or  controllers  to  create  conflicting 
scenario  data  when  the  ground  truth  data 
are  calculated  or  derived  from  a physics- 
based  model,  objective  data  and  a 
consistent  set  of  assumptions.  In  NLE10 
some  inconsistencies  in  the  ground  truth 
data  did  appear  because  of  the  use  of 
several  models  with  different  assumptions. 
For  example,  two  of  the  data  providers 
calculated  casualty  numbers  which  differed 
significantly  due  to  the  fact  that  different 
population  databases  were  used.  Through 
the  use  of  a unifying  M&S  framework, 
discrepancies  between  models  can  be 
managed  by  following  these  guidelines: 

1)  Models  with  the  same  inputs  and 
outputs  should  be  managed  in  a single 
simulation  template,  making  it  easy  to 
set  up  comparative  runs.  The  same 
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template  should  be  used  to  compute  all 
ground  truth  data. 

2)  All  assumptions  and  input  parameters 
are  documented,  openly  shared  and 
used  among  the  data  providers. 

3)  All  M&S  tools  (including  databases)  that 
are  used  to  calculate  scenario  data  are 
integrated. 

To  minimize  inconsistencies  and  enhance 
coordination  in  scenario  development, 
SUMMIT  is  providing  an  integrating 
frame\A/ork  through  which  multiple  models 
and  datasets  can  be  used  together  to 
generate  consistent  data. 

4.0  CONCLUSIONS 

NLE10  provided  important  lessons  learned 
on  architecture  requirements  and  ConOps 
for  SUMMIT.  These  are  being  implemented 
in  SUMMIT  and  will  help  enhance  exercise 
planning  and  conduct  in  NLE11.  Research 
on  SUMMIT  support  to  Tier  ll-IV  exercises 
is  underway  and  will  build  upon  the  lessons 
learned  from  support  to  NLEs. 

The  SUMMIT  architecture  has  proven  to  be 
flexible  enough  to  create  simulations  via 
model  federation  that  allow  for  complex 
scenario  construction  in  an  intuitive  manner. 
The  flexibility  and  extensibility  of  the 
architecture  also  allows  for  evolutionary 
growth  with  participation  of  the  M&S  and 
exercise  communities.  Integration  of  data 
visualization  tools  and  virtual  environments 
allows  M&S  data  and  results  to  be  readily 
accessed  by  the  exercise  community. 

A SUMMIT  early  adopter  program  has  been 
established  to  evaluate  the  integration 
process  with  the  participation  of  model 
contributors  in  the  M&S  community. 
Information  about  SUMMIT  and  this 
program  can  be  found  at  the  SUMMIT  web 
site  (http://dhs-summit.com). 

The  current  focus  for  SUMMIT  is  support  for 
emergency  response  exercises;  subsequent 
research  will  focus  on  emergency  planning 
and  response  operations.  Bringing  modeling 
and  simulation  tools  to  emergency  planning 


and  operations  will  allow  for  improved 
accuracy  in  exercise  parameters,  creating 
more  realistic  training  exercises  and  better 
prepared  emergency  responders. 
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4.6  Virtual  Worlds  and  Homeland  Security 
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Agenda 


• Overview 

• Requirements 

• Applications 
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WORLD 

Ijlj  Cortierenc*  & Ejtfso 


Avatars 


• Avatars  mimic  natural  human 
movement 

- Controlled  via  simple 
keyboard  or  controller  input 

“ Avatars  use  realistic 
animations  and  advanced 
blending  techniques 

- Emotion  and  expression 
framework  combines  user 
input  and  scripted  behavior  to 
mimic  culturally  specific 
movement  patterns 

- Integrated  physiology  model 

“ FaceGen  integration  provides 

photo- specific  avatars 
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IMODSIM  WORLD 

1 Cortiefenoe  i-  Ejip& 


Physics 


• Programmable  physics 
engine  adapts  to  network 
latency 

- The  physics  engine  can  be 
programmed  to  simulate 
real-world  dynamics 

- Simulation  is  accurate, 
validated  on  the  server 
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MOD  SIM  WORLD 

I fc  " Cortierenc*  4.  Eap& 


Networking 


Supports  distributed  operations 

- Users  login  to  the  virtual  environment 
from  remote  locations  across  the  globe 
and  participate  just  as  they  would  if 
they  were  co- located 

Networking  engine  minimizes 
bandwidth  requirements 

- Effi  cie  nt  commu  n icatio  n p rotocol 
minimizes  necessary  bandwidth, 
allowing  simulation  to  run  over  LANs, 
WANs,  and  the  Internet  (such  as  long- 
haul  networks) 


LANs  = Large  Stea  cietwerl^s  WANs  - y^kJe  area  networks 

OLEVE  ts  a trademark  of  Science  Ap^lcattons  InlematiotraJ  Carporation  in  llie  U.S.  acd/ar  olher  ccuntries. 
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“ In-World  Communication 


* Multiple  forms  of 

communication  between  users 

- Spatially  accurate  voice-over- 
ip  (VoIP) 

- Highly  integrated  voice 
communications  with  lip  sync, 
automated  gesticulations  and 
speaker  attention 

- Instant  messaging  (broadcast 
or  person  to  person) 

- Built-in  radio  communication 

- Manual  hand  signals  and 
gestures 

- Culturally  specific  library 
integration 

- Telephony  for  external  access 
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MOD  SIM  WORLD 

I fc  " Cortlerenc*  4.  Eap& 


Collaboration  Features 


• Supports  in-world  presentation 
screens  that  support  a variety 
of  rich  media 

— PowerPoint® 

- Streaming  video 

- Live  streaming  video 

- Application  sharing 

• Multiple  screens  can  be  placed 
throughout  the  world 

• Prompter,  zoom  support 

• Laser  pointers 

• Presence  indicators 
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“ Geospecific  Terrain 


Supports  large-area, 

geospecific  terrain 

- CDTSE  CORE 
databases 

- WGS-84  Datum 

- OpenFlight 
interoperability 

- Double  precision 
processing 


CDT  SE  CORE  - Common  Driver  Trainer  Synltietic  EnvironcnenI  Core 
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MOD  SIM  WORLD 

I fc  " Cortierenc*  4.  Eap& 


Session  Replay 


• BuiltHn  distributed  replay 

- Collects  all  voice, 
keyboard/mouse  and 
controller  inputs  across  the 
system 

- Plays  results  back  through 
system,  allowing  free-cam 

- Full  data  mart  for  external 
analysis 

- VCR  playback  features 

- Distributed  camera  control 
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“ Non-Player  Characters 

• General,  open  API  for 
integrating  external 
artificial  intelligence 

- Ability  for  external 
application  to  instantiate 
and  control  entities 

- API  provides  information 
on  in-world  activity  to 
external  application 

- Support  for  low-level-of- 
detail  avatars  for  crowd 
scenes 

- Can  also  be  used  to 
support  real-time  telemetry 

Apt  = appILc^liof)  fKTogrammingf  intiarfac# 
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MOD  SIM  WORLD 

I fc  " Cortierenc*  4.  Eap& 


Special  Effects 


Supports  a variety  of 
special  effects  to  add 
realism  to  the  scene 

- Particle-based  effects  for 
natural  phenomenon 

- Hold  tools  to  build  items 
with  which  avatars  can 
interact 

- Time  of  day  and  weather 
support 

- Full  suite  of  weapons, 
including  small  arms  and 
rocket-propelled  grenades 
(RPGs) 
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“ Enterprise  IT 

• Working  to  support  deployment  challenges 

- Full  support  for  behind  the  firewall  operation 

- Port  multiplexing  to  support  single  port 
communication  through  firewalls 

- Lightweight  Directory  Access  Protocol  (LDAP) 
integration 

- Integration  with  eAuthentication  to  support  Level  2 
authentication 

- Secure  Socket  Layer  (SSL)  encryption  available 
between  server  and  client 
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• Persistent  room 

• Screen  placement 
optimized  for  team 
use 

• Team  documents 


Project  Management 


IMODSIM  WORLD 

1 COrtlefCrtCA  A EJit5U 


• Realistic  and 
hypothetical  scenarios 

- Scenes 

- Simulations 

- Rote  players 

layer  characters 

• Scenario  and  Scene 
Editor 

. CBT  or  SCORM- 
integration 

- Instructor- led 
‘ Self-paced 

• Record  and  replay 

• Data  mart 


Training 


CBT  = oomputer-basetHrainin^ 

is  a ragislened  Iradecnarft  ol  the  DeparUnent  cnf  De  fense  in 
the  If.S.  amJ/or  olhercounlries. 
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° Operational  Solutions 


• Virtual  emergency 
operation  centers 

• Common  operating 
picture 

• Context-specific 
operation  centers 

• Connection  to  real 
world  - GPS,  RFID 
and  other  sensors 

• Embedded  rehearsal 
environments 


OPS  = GeograpTiic 
Information  System 

RFID  - radio  fnequency 
ictenlificstlon 
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5.0  THE  HUMAN  DIMENSION  TRACK 


5 . 1 The  Army  ’ s Human  Dimension 551 

5.2  Human  Performance  Modeling  Tools  for  Better  System  Design 562 

5.3  ACT-UP:  A Toolkit  for  Hampton,  Cognitive  Modeling  Composition,  Reuse  and 

Integration 570 

5.4  Examining  the  Relationships  Between  Education,  Social  Networks  and  Democratic 

Support  With  ABM 584 

5.5  Connections,  Fallacies,  and  Potential  Directions 600 

5.6  Design  and  Evaluation  of  a Cross-Cultural  Training  System 619 

5.7  Technophiles  to  Newbies:  The  Challenge  of  Supporting  Distributed  Teams  to 

Maintain  Engagement  in  Virtual  Worlds 629 

5.8  A Systematic  Approach  for  Engagement  Analysis  under  Multitasking  Environments..  634 

5.9  Modeling  Pilot  State  in  Next  Generation  Aircraft  Alert  Systems 65 1 

5.10  Imbalanced  Learning  for  Functional  State  Assessment 675 

5.11  Predicting  the  Consequences  of  Workload  Management  Strategies  with  Human 

Performance  Modeling 701 

5.12  Simulating  Visual  Attention  Allocation  of  Pilots  in  an  Advanced  Cockpit 

Environment 713 

5.13  Modeling  Being  “Lost”:  Imperfect  Situation  Awareness 730 

5.14  Investigating  Intrinsic  and  Extrinsic  Variables  During  Simulated  Internet  Search 738 

5.15  Following  Human  Footsteps:  Proposal  of  a Decision  Theory  Based  on  Human 

Behavior 760 

5.16  Use  of  Inverse  Reinforcement  Learning  for  Identity  Prediction 764 

5.17  Social-Cognitive  Biases  In  Simulated  Airline  Luggage  Screening 782 

5.18  EEG  Artifact  Removal  Using  A Wavelet  Neural  Network 820 


550 


5.1  The  Army’s  Human  Dimension 


The  Army’s 
Human  Dimension 


COL  Steven  Chandler 


Chief,  Human  Dimension  TF 
Army  Capabilities  Integration  Center 
TRADOC 


Ill.H't'l 


Army^trong 


Cognitively r Physically 
& Socially 
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Arnijf  Com  btai-eil  Amis 
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Director 


ARCIC 


The  Army  Capabilities  Integration  Center 
(ARCIC)  designs^  develops^  integrates  and 
synchronizes  force  capabilities  for  the  Army 
across  the  DOTMLPF  imperatiwes  into  a Joints 
Interagency^,  and  Mu/tinationai  operational 
environment  from  concept  through  capability 
development 


ARCIC 
Forward  (DC) 


Externa  [Agencies 


Air  Land  Sea 
Application 
(ALSA) 


Reserve 

Component 


Service  LNOs 
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study 


Joint  & Army 
Senior  Leaders 
Congressional 
Staffers 


Concept 


General 

Audience 


Handout 


Given  the  requirement  for  Full  Spectrum  Operations  in  an  Era  of  Persistent  Conflict  with  the  demands 
of  the  Army  Force  Generation  cycle  (Reset -Train/Ready- Available)  and  that  Soldiers  are  the 
centerpiece  of  our  formations,  human  capabilities  are  the  key  to  winning  our  current  and  future 
wars ... 

The  Operational  Problem:  The  Army  must  focus  our  Human  Dimension  efforts  to  ensure:  sustained 
quaiity  of  the  All-Volunteer  Force,  trained  Soldiers,  Civilians,  Leaders  and  units  prepared  for  Full 
Spectrum  Operations  (FSO),  a resilient  force  Reset  and  Trained/Ready  for  deptoyment  and 
prepared  for  compiex  and  demanding  Joint,  Interagency,  Intergovernmental,  and  Multi-national 
(JifM)  environments  now  and  in  the  future. 


Requires  an  adaptive  institution 


http://www  Iradcx:  arm  y rn  il/tpubs/pamndx  htm 
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Previous  Army  concepts  acknowledge  the  Soldier  as  the  centerpiece  of  our  formathnSj  Put  none,  individually  or 
coiledively,  adequately  addresses  the  human  dimension  capabilities  for  current  or  future  operations. 


Using  available  and  emerging  tools  that ... 

-optimize  cognitive  flexibility,  mental  intellect  and  information  processing  through 
enhanced  screening,  recurring  assessment  and  tracking  of  individuars  potential  and 
attributes;  dynamic,  scalable,  adaptive,  immersive,  sensory  enabied,  multi-layered 
tailored  training;  Adaptive  material  systems  maximizing  individua/  attributes. 

- develop  lifelong  total  fitness  habits  through  comprehensive  wellness  programs  that 
build  aerobic/mental  capacity,  strength,  endurance,  agility,  focused  nutrition,  stress  and 
sleep  deprivation  management,  behavioral  health.  Build  resilience  thru  the  Physical, 
Mental,  Family,  Social  and  Spiritual  domains  of  strength. 

- strengthen  character  and  intercultural  adaptability  that  reflects  confidence  in  tough 
moral,  culturally  sensitive  situations  grounded  in  law,  Warrior  Ethos  and  Army  Values; 
develop  improved  understanding  of  social  / family  dynamics,  respect,  interpersonal 
relationships,  spirit  and  faith;  strengthen  team  building,  foster  cohesion 


The  future  domestic  / global  operating  environment  requires  agile  policies  to 
support  a comprehensive  Human  Dimension  approach. 


Cognitive 


Physical 


Social 
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A recruited  Future  Force  managed  and  retained  based,  in  part,  on 
continuous  cognitive  assessments  {e.g.  Attention,  Learning,  Leadership 
potential,  Adaptability,  Decision  Making,  Vision).  Advanced  technologies  and 
tools  assist  in  the  selection  of  individuals  for  assignments  and  advanced 
accelerated  / measurable  training.  Enhanced  tailorable  multi-lavered  training 
(to  facilitate  and  accelerate  task  learning),  leader  development  and  Mission 
Command  systems  that  adapt  to  cognitive  styles  to  maximize  readiness. 
Leaders  provided  PDA-like  devices  that  access  training  and  decision-making 
tools  to  track  soldier  readiness  and  assist  matching  talents/skills  to  mission 
requirements. 


Facts: 

- 1 00  bil  neurons,  .6  eft,  uses  2w/hr  vs.  Super  Computer  @ 1600  sft,  5000w  just  for 
cooling 

+ Everyone  “wired differently  - More  synaptic  connections  than  all  known  bodies  in 
the  universe  Le.,  billions 

• 2%  body  mass,  yet  consumes  20%  of  the  energy  - atert  or  asleep 

• 3 seasons  of  the  brain  - Maturing.  Aduit,  Aging 

• Spatial  Navigation  differences  - Men  and  Women  ARE  different 
Cognitive  Peaks  differ  individually  for  D- making 


Possibilities: 


- Predict  leadership  potential  and  decisiommaking  capabilities 

- Identify  cognitive  styles,  special  skills  and  attributes 

‘ Cognitive  gym  - Cognitive  PT  Test  that  buifds  capability  & experience 
» Cognitive  UCOFT  - exercise  full  spectrum  skills 
Accelerate  learning  - taifored  to  individual  potential  and  preferences 
-Train the  unfrainable? 


UCOFT  - Unit  Conduct  Of  Fire  Trainer 


Maximize  a Soldier’s  inherent  cognitive  potential  and  learning 
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A Future  Force  that  adheres  to  a continuum  of  holistic  fitness 
tailored  to  the  individual  and  subsequent  mission  requirements 
(measurable  physiological,  neurological,  psychological,  nutritional,  and 
developmental  fitness  training).  Programs  that  identify,  mitigate,  treat  and 
rapidly  restore  soldiers  who  become  holistically  "unfit”  due  to  combat 
operational  and  stress-related  injuries.  Retention  of  qualified  physically 
disabled  soldiers  is  the  norm. 


Measuring  excessive  fatigue 
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office  of  Comprehensive  Soldier  Fitness.  HQDA  G-3/5/7 


rS: 


■ j ■! 


mil 1 m 


Global  Assessment  Tool  (GAT) 


Pyv>ti 

V 

Master  Resiliency  Program 

Possibilities: 

• Effective  resilience  buiid’mg  and  stress  mitigation 

• Identify  combat  stress  / PTSD  vulnerabiiity  - 

• Soldiers  physically,  mentally,  emotionally^  socially  fit 


Finding  humor 
in  something 
that  has  gone 
wrong ... 


Finding 
strength  to 
work  thru 
adversity*.. 


Maximize  a Soidier's  inherent  physical  potential  and  holistic  health. 


A Future  Force  that  functions  and  behaves  in  accordance  with:  law; 
Army  Values;  and  national/international  expectations  and  standards. 
Grounded  by  a continuum  of  adaptable,  scalable  and  measurable  training 
programs  that  include  operational  challenges  in  tough  ethicat/moral 
situations.  Leaders  achieving  intercultural  adaptability,  language  skills  and 
respect  of  the  potential  strategic  impact  how  ethical  behavior  affects  ones' 
self,  the  Army  and  Family  values. 


Soldiers  and  leaders  must  feel  confident ...  to  interact  day-to-day  with  people  of  different  cultural  backgrounds  and 
perspectives.,,  GEN  Casey 
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warrior's 

Spifll/Faith 


CultumI 

Uncferstanding 


Professional 

Ethic 


Language 

Development 


Moral 

Values 


Maximize  a Soldier's  inherent  social  potential  as  a culturally  astute  warrior  and  world  citizen. 


The  Charaictsr  of  So  I d i e r 


Possibilities: 

• Identify  aptitude  for  language,  respect  for  cultural  differences,  openness;  establishing  trust 

* Each  Soldier,  DA  Civilian  and  Family  member  an  integral  component  of  a social  netw/ork 
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EMERGING  Tools  and  Methods 


Methods 


— Psychological 
assessments 


--  Physiological 
assessments 


Brain  Connective 
Topology 


EEG  & fMRI  Functional 
Mapping 
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• Enable  the  Future  Force  to  impart  more  skills,  faster,  at  lower  cost  and  with 
greater  retention  than  currently  achievable 


• Use  non-traditional  home  station  training  techniques  and  technology,  train 
prior  to  employment 


Enhance  and  account  for  individual  proficiencies 
(outcome  based) 


Leader  development  must  be  completely 
adaptable,  scalable,  multi-layered  in  complexity 


1 . The  measurement  and  assessment  of  human  performance  is  a 
centerpiece,  not  an  ancillary  benefit. 

2.  LVC  training  must  contain  immersive,  decision-making  stimuli  with 
increasing  variabies  that  do  not  repiicate  what  was  done  before.  Nor 
pennit  unintended  bad  habits. 

3.  Tacticai,  morai,  ethical  decision-making  must  be  stressed  and  pervasive 
in  all  training.  Such  contexts  iead  to  individuai  and  smail  unit  seif- 
confidence.  it  shouid  aiso  have  open-ended  objectives  and  chaiienge  the 
task-condition  standard  construct. 

4.  HD  brings  a NEW  way  of  accessing,  seiecting,  training,  developing  and 
transitioning  Soldiers  requiring  evidence-based,  measurable  results  that 
can  be  correlated  to  return  on  investment. 
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We  are  about 

Empowering  Soldiers  / units  to  dominate  the  Land  Domain 

^ Improying,  optimizing  and  restoring  cognitive^  physical,  and  social 
abilities 

^ Enabling  Soldiers  to  function  efficiently  as  an  integral  component  of  a 
network  and  society 
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1 Takinathe  Capability  Apptoach 

Outcome:  An  Armv  Doised  to  achieve  its  maximum  cognitive,  Dhvsical,  and  social 

potential  for  Soldiers,  Civilians,  and  Families  by  optimizing  their  abilities, 
experience,  education  and  fitness. 


Integrated  Capability  Development  Team  (ICDT) 


DCR  - DOTMLPF-P  Change  Request;  DICR  - DOTMLPF-P  IntegratetJ  Change  Request;  ICD  - Inlllal  Cflpabiiilies  Decumeni; 

CM  A - Capabilrty  Weeds  Analysis 


HD  Quad  Cha-: 


HP  ICP  Outcomes  nested  w^th 
The  Army  Capstone  Concent: 


Operational  Adaptability 

’ Quality  Army  Leaders  and  forces 

■ Applied  critical  thinking 
» Comfort  w/  ambiguity  St  dece nt ra I ization 

■ Accepting  prudent  risks 
‘ Continuous  assessments  & rapid  adjustments 

■ Educated  Leaders  culturally  astute 

• P rof  I cie  nt  St  Cohesive  Tea  ms  ^ 

* Resilient  Soldiers  successfully  enduring  psychological  @ 

St  moral  challenges  ^ 


CPS 

§ 

§ 


Context; 

HD  reaches  across  Army  forces^  warfightirrg  functions  and  vanous 
Force  Modernization  proponents. 

No  organization  or  process  to  develop  holistic  resourced-informed; 
outcomc%-based;  integration^focused  HD  CPS  capabilities, 

AR  5-22,  The  Army  Force  Modernfiation  Proponent  System  does  not 
designate  a Force  Modernization  Proponent  for  HO. 

Many  disparate  organizations  are  developing  HD-related  capabilities  in 
parallel  and  stove-piped  manner  challenging  the  effort  to  collect  gains  made 

No  HD  Center  of  Excellence  (CoE)  or  Capabilities  Development 
and  Integration  Directorate  (CDID) 


Transformational  endeavor  requiring  proponencyto 
integrate  CPS  attributes  across  the  Army 

Continued  investment  in  R&O  of  CPS  measures  is 
required  to  deternnine  and  predict  Soldier  potential^ 
performance  and  resiliency 

Policy  changes  are  required  for  acquisition,  selection,  development, 
retention,  career  management  and  transition 


Recommendation; 

Establish  HO  as  a Program  of  Record 

Establish  a HD  management  structure  having  Force  Modernization 
proponency  for  HD  resourced  with  theTCM  and  CDID 

Charter  a Senior  Advisory  Group  to  facilitate  the  effective  and  efficient 
enabling  of  research,  development  and  experimentation 

Redefir^e  Soldier  readiness  In  CPS  terms 

Add  squad/small  unit  readiness  as  a pacing  itein 

Utilize  the  Human  Capital  Enterprise  (HCE)  to  support  HD  equities 
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High  School, 
College 

Trained  Citizens 

m 


Ccgniltvv 

PtiysBal 

Social 

Policy 
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5.2  Human  Performance  Modeling  Tools  for  Better  System  Design 


AAQDStM  WORLD 
COfirMnc^  Jk  Expo 


Oct<jlwM3-15,  2015 
HamptDn,  Virginia 


Human  Performance  Modeling  Tools 
for  Better  System  Design 


Charneta  Samms 
U.S.  Army  Research  Laboratory 


us/miyiY 

RDECOM 


October  13-15,  2010 


“ Agenda 

• Why  Human  Performance  Modeling  (HPM)? 

• Importance  of  HPM  to  System  Design 

• HPM  Tools  and  Applications 

- CogTool 

- C3TRACE 
-IMPRINT 
-MIDS  Plug-in 

• Expansion  of  tools 

• Summary 
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WORLD 
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Why  Human  Performance 
Modeling  (HPM)? 


Concept  System 

Many  Variables 

Field  Study  Not  Feasible 

Too  Dangerous 

Model  - Test  - Model 

System  Performance  = /(human  performance) 


MODStM  WORLD 
COfiriMnc«  Jk  Expo 


Importance  of  HPM  to  System 
Design 


User  Needs 


Technology  Opportunities  & Resources 


Under  S&oratflry  of  Defense  for  AcqtisilkJn,  Techndogy.  and  Logistics.  (2O0S).  Depaimenl 
the  Defense  lnslfuct»n  {DoDfJ  5000,0?  pperaftonoftfie  Defense 
Avails  online:  httpV/wvwdtic.iTTili^vhsMrec^iw^^ 


^ 4^ 


IPrcgram 

Iniliatioril 


A 


IOC 


FOC 


Materiel 

Solution 

Analysts 

Technology 

Development 

Engineering  and 
Manufacturing 
Development 

Production  & 
Deployment 

Operations  & 
Support 

#?Sg-*#Sg-A 

lrip/iot&e  a 

/\  Sustainment  / 


^PrC‘Sys terns  Acquisltlony\ 


Systems  Acquisition 


0 = Decisi  on  Poi  nt  /\=  Mi  te  ston  e Review  = Oecis  ion  Point  if  P D R is  not  co  ndu  cted  before  Mi  le  stone  B 


Provide  quantitative  data  to  inform  trade  off 
decisions  early  in  design  process 
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..■r  AtAOQSm  WORLD 
^ ^ CofirMnce  Jk  Expo 


CogTool 


1 . Sat  I t to  compara  dsatgn  allamativaB  on  a «irte  at  lasks 


General  purpose  Ul  prototyping  tool 
Automatically  evaluates  design  with 
predictive  human  performance  model 
"cognitive  crash  dummy" 


Bonnie  E.  John,  Phnoiple  Investigator 
Human-Computer  Interaction  Institute 
School  of  Computer  Science 
Carnegie  Mellon  University 
htto://coqtool.orQ 


Examina  what  the  AC  I- R modal  dd  lo  pmduca  the  pfadidion 
In  an  ailafaii^ve  [imallna  uiaualizallon, 


John,  B,  E„  12010}  CogTool:  Predidive  Human  Peifocmance  Modeling  bv  Demonstration.  PToceedingsd  tite  19th  Annual  Conference  on  Behavior 
Representation  in  Motteling  and  Simulalion  (BRIMS)  (Ch^eston,  SC,  March  21-24,  2010). 


A^ODStM  WORLD 
CofirMnce  & Expo 


CogTool  Application 


m mmwWWmwm  M M mitImWmmm  W m mm 


Compared  time  to 
complete  programming 
tasks  within  two 
environments 

• 2002  - Unix  command 
line  and  Vim  editor 

• 2010  - Eclipse  Parallel 
Tools  Platform  (PTP) 


R n O " Project : PERCS_CLI_FTP_Corng^sQn_2Ci  Q5 1 2 16_  1 5 1 J 


Results 

Eclipse  PTP  interface  will  improve 
performance  of  skilled  programmers 
over  2002  command  line  interface 


Tasks 

Command  Line 

FIT- Ed  ipse 

It  HellowWorld_mpi 

Min;  114,40s  s 

Min;  40,090s 

HellDWGfld_mpi  with  keyboard 

161,111  s 

HelloWofld_nipi  wiKh  mouse 

U4.40S  S 

40.090  S 

¥ FI  Help 

Minr  30,726  s 

Min;  10,462  s 

FI  Help  wun  kevbosird 

30,726  s 

FI  Help  wiihi  mouse 

31.247  S 

10.462  S 

T Code  Folding 

Min:  S,780  s 

Min:  3,563  s 

Code  folding  with  keyboard 

10.490  s 

Code  folding  with  nrouse 

5.7S0  S 

3.563  S 

Barrier  Anatysis 

40,149  s 

11,633  s 

Richards,  J„  Bellamy,  R„  John,  B.,  Swart,  C,  & Thomas,  J.  [2010)  Using  CogTool  to  Model  Programming  Tasks.  Psychologyof 
Programming  Interest  Group  WIP  (PPIGWIP)  (Dundee  2019),  www.PDiOi.orQ, 
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C3TRACE 


Command,  Control,  and  Communications  • Techniques  for  the  Reliable 
Assessment  of  Concept  Execution 


Goal:  To  conduct  "what-if  analyses  based  on 
information  flow  and  quality,  to  discover  alternative 
organizational,  personnel,  and  system 
configurations  that  increase  performance 

• Evaluate  effects  of  different 


personnel  architectures  and 
information  technology  on  system 
and  human  performance 

• Investigate  efficiency  and 
effectiveness  of  message 
processing  in  Command  & 

Control  environments 


A^ODSEM  WORLD 
ConriMnc«  & Expo 


C3TRACE  Application 


Future  Command  and  Control  Cell  Analysis 


System 

Issue 


Requirements  and  Force 
Design  are  in  conflict 


1.  Modify  deployment  concept 

diifemt  aircr^fi 

2.  Accept  degradation  in  16  soldier  C2  cell 

Rsduced  capabii^ 

3.  Accept  more  C2V$ 

IWo/e  fjiwey:  m&jVFfenance.  lifts 


Results 

Performance  Measures 

Utilization 

Probability  of  “good  decision' 
Messages  handling 


6 Cell  Configuration 


• 1 9 of  24  - 1 00%  utilization  • 1 3 of  24  - 1 00%  utilization 

• 6 of  24  - 25+%  poor  decision  quality  • 5 of  24  - 25+%  poor  decision  quality 

• 18  of  24 -dropped  50+%  of  messages  • 8 of  24  - dropped  50+%  of  messages 


Mitchell,  D.  K.,  Samms,  C.,  Kozycki,  R.,  Kilduff,  R,  Swoboda,  J.,  & Acirnashaua,  A.  (2006)  Soidiar  Mental  Wbiidoad,  Space  C/awDS,  and  fiifermahofi  Flow 
Anafysis  of  die  Combined  Ams  Battaim  Headquefters  Command  and  Control  (C2)  Cells  (ARL'TR'^61).  Anny  Research  Laboratory,  APG,  MD. 
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Improved  Performance 
Research  Integration  Tool 


MODSEM  WORLD 
CofirMnce  Jk  Expo 


iBiprwd  fcsHwch  mtef ruwn  1im4 


334  users  suppoftingArmy,  Navy  Air  Force, 
Marines,  NASA,  Department  of  Homeiartd 
Security  (DHS),  Department  of  Transportation 
(DoTl  Joint  and  other  organizations 
across  the  country 


http://www.arl.army.mil/IMPRINT 

https://km3.alionsdence.com/sites/imprint 


IMPRINT  can  be  used  to 


• Set  realistic  system 
requirements 

• Identify  future  manpower  & 
personnel  constraints 

• Evaluate  operator  & crew 
workload 

• Test  alternate  system-crew 
function  allocations 

• Assess  required 
maintenance  man-hours 

• Assess  performance  during 
extreme  conditions 


• Examine  performance  as  a 
function  of  personnel 
characteristics  and  training 
frequency  & recency 

• Identify  areas  to  focus  test  and 
evaluation  resources 

• Quantify  human  system 
integration  risks  in  mission 
performance  terms  to  support 
milestone  review 

• Represent  humans  in 
federated  simulations 


IMPRINT  is  a trade-off  analysis  tool 
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“ IMPRINT  Application 


System 

Issue 


Methodology 

Identified  functions  and  tasks  via 

knowledge  elicitation 

Set  up  experimental  conditions  to 

model  based  on  varying  function 

allocations 

Built  models 

Validated  models  by  walking-through 
with  Soldiers 

Completed  runs  and  prepared 
results 


Results 


1 Commancter  - Dfivei  and  Gufiner 

1 mirMud  of  ad  Dcnditiais 

1 

t 

Gunn«r  - Driver  and  Commander 

tiD  sticntifig  on  Ihe  tnove 

j 

-J 

X 

X 

Commander -Gunner  and  Driver 

1 

* 

1 Commander,  Driver  and  Gunner  | 

1 Twq  $c^!uung.  hDneH4J«f  p^Yksoptiv 

Mitchell,  D.  K„  Sarrms,  C.  L„  Hernthofn.  T.,  & Wojciecliowski,  J.  0.  (2003).  Trade  study:  ATwo-versus  Ihree-Soldiercrewfor  the  Mounted  Combat  System 
(MCS)  and  other  future  combat  system  platforms  [Technical  report  ARL-TR-3026).  Aberdeen  Proving  Oound,  MD,  U S.  Amvy  Reseaioh  Laboratory. 


WORLD 
CofirMnc^  & &cpo 


A Decade  of  Impact  on 
Soldier-System  Integration 


Combined  Arms  Testbed 

First  idertti^cedoi}  of  workload  issues 
assoc^afecf  with  a 2 Soldier  common  criew 


Command  and  Control  Cell 

Supported  requirement  for  24 
personnet  attocated  to  the  battalion  in 
the  Unit  Reference  Sheet 


Lightweight  Howitzer 
Supported  the  possibifity  of 
reducing  crew  size 


Future  Howitzer 

WorWoacf /ssrieiS  associated  with 
rearming  resulted  in  an 
automated  rearming  concept  to 
be  included  in  system  design 


Autonomous 
Navigation  System 

Provided  support  for  the  ANS 
technology  to  increase  crew 
performance 


1999 


2001 


20Q3 


2D05 


2007 


2009 


Situational  Understanding 
STO 

IdenMed  chticai  information 
requirements  for  system  and 
display  development 


Future  Tank  Platoon 
Leader  Variant 

High  workload  analysis 
predictions  matched 
expenrr}ental  results 


All  Future  Concept 
Vehicle  analyses 

Soldier  workload  identified  as 
#t  /sswe  during  preliminary 
design  rewew 


Future  Tank 

Identified  workload  issues  which 
resulted  in  system  design  change 
from  2 to  3 Soldier  crew 


Future  Reconnaissance  and 
Surveillance  Vehicle 

Served  as  basis  of  manning 
assessment  and  jusMed  need  for  aii 
operators  to  have  displays 
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° Multimodal  Information  Design 
Support  (MIDS)  Tool  Plug-in 

• Develop  potential  mitigation  strategies  from  multimodal 
design  guidelines  matched  to  areas  of  high  workload  as 
identified  in  IMPRINT 


“ Expansion  of  Tools 

• Develop  smart  “links”  between  tools 

• Keep  up  with  evolving  analysis  demands 

• Specific  Enhancements 

- CogTool 

• Additional  measures 

- C3TRACE 

• Visualization  of  impacts  to  decision  quality 

- IMPRINT 

• Connect  to  system  engineering 

- MIDS  Plug-in 

• Predict  effect  of  incorporating  mitigation  strategies 
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Summary 


WORLD 
Cofireflonce  Jk  Expo 


• Use  of  human  performance  modeling  tools  can 

- Provide  quantitative  data  to  inform  trade  off  decisions 
early  in  design  process 

- Cost  savings 

- Better  design 

- Focus  test  and  evaluation  resources 

- Model  - Test  - Model  approach 

• Expand  tools  to  answer  new  analytic 
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5.3  ACT-UP:  A Toolkit  for  Hampton,  Cognitive  Modeling  Composition, 
Reuse  and  Integration 


WORLD 

C'OrtlMrtOe  ^ &JipO 


October13-1S,2010 
Haitipton,  Virginia 


ACT-UP: 

A Cognitive  Modeling  Toolkit  for 
Composition,  Reuse  and 
Integration 


Christian  Lebiere  and  David  Reitter 
Carnegie  Mellon  University 
cl@cmu.edu:  reitter@cmu.edu 


JMODSIW  WORLD 
^ Conler«ricei  t,  E»po 


ACT-R  Cognitive  Architectures 


• Computational 
implementation  of 
unified  theory  of 
cognition 

• Commitment  to  task- 
invariant  mechanisms 

• Modular  organization 

• Limited  capacity 

• Hybrid  symbolic 
statistical  processes 
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Motivations  and  Applications 

• Philosophy;  Unified  understanding  of  the  mind. 

• Psychology:  Account  for  experimental  data. 

• Education:  Provide  cognitive  models  for 
intelligent  tutoring  systems  and  other  learning 
environments. 

• Human  Computer  Interaction:  Evaluate 
artifacts  and  help  in  their  design. 

• Computer  Generated  Forces:  Provide  cognitive 
agents  to  inhabit  training  environments  & games. 

• Neuroscience;  Provide  a framework  for 
interpreting  data  from  brain  imaging. 


WORLD 

r VI  Conler,n«  A F.po  W O 3 I S 

• Enable  the  implementation  of  more  complex 
ACT-R  models 

• Scale  up  cognitive  models  to  simulate  learning 
/ adaptation  in  communities 

(e.g.,  about  1,000  models  in  parallel) 

• Treat  models  as  hard  claims 

- Evaluate  each  specified  component  against  data 

- Underspecify  the  rest  and  fit  free  parameters 
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The  Argument 


• Constraints:  Architectural  advances  require 
further  constraints 

• Scaling  it  up:  Complex  tasks,  broad  coverage 
of  behavior  (e.g.,  linguistic),  use  of 
microstrategies  and  predictive  modeling  may 
serve  to  motivate  further  architectural 
constraints 

• Difficulties:  ACT-R  is  heavily  constrained 
already,  and  models  are  difficult  to  develop, 
reuse  and  exchange 


Control  Structure 


A flow-chart  describes  an 
algorithm  (or  a cognitive 
strategy) 

Decision-making  points 
and  states 

Not  easy  to  reuse:  it  fails 
to  capture 
generalizations 

Computer  Science: 
pre-Object  Orientation, 
pre-Functional 
Programming 
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Decomposition 


MOEIJIM  WORLD 

C'Ortiertrtco  & E»po 


IF  THEN 


IF  THEN 
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The  Argument 

• Constraints;  Architectural  advances  require  further 
constraints 

• Scaling  it  up:  Complex  tasks,  broad  coverage  of 
behavior  (e.g.,  linguistic),  use  of  microstrategies  and 
predictive  modeling  may  serve  to  motivate  further 
architectural  constraints 

• Difficulties:  ACT-R  is  heavily  constrained  already,  and 
models  are  difficult  to  develop,  reuse  and  exchange 

• We  need  to  produce  models  at  a higher  abstraction 
level 

- However,  we’d  like  to  leverage  successful  cognitive 
modules,  describing  memory  retention,  cue-based 
retrieval,  routinization,  reinforcement  learning 


MOPSIM  WORLO 

Conloroncfl  A 


Cognitive  Strategy 


Subsymbolic 
(Learning  / 
Adaptation) 


non-deterministic 
explains  empirical  variance 
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WORLD 
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Priming  Modei 


Crucial  request  of  a chunk  from  declarative  memory 


• Only  a small  portion  of  the 
model  explains  the  behavioral 
data  at  hand 

• The  rest  explains  that  the  task 
can  be  accomplished  in 
principle  with  a parallel 
architecture  and  with  specific 
cognitive  representations 
(chunk  types) 


Conloroncfl  i,  Fitpo  Production  Systems  vs. 

assembly  ianguage 


7 Decrement 
^counter  by  one 
sumloop  ; until  it 

tt2,Dl  ; Double  sum  to  account 


bne 


rts 


j for  even  numbers 
f Return 
jto  caller 


1990 
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The  Argument 


• Constraints:  Architectural  advances  require  further 
constraints 

• Scaiing  it  up:  Complex  tasks,  broad  coverage  of  behavior 
(e.g.,  linguistic),  use  of  microstrategies  and  predictive 
modeling  may  serve  to  motivate  further  architectural 
constraints 

• Difficulties:  ACT-R  Is  heavily  constrained  already,  and 
models  are  difficult  to  develop,  reuse  and  exchange 

• Abstraction:  To  implement  those,  we  need  to  produce 
models  at  a higher  abstraction  level 

• Underspecification  is  the  key  to  focus  on  verifiable 
claims,  and  to  avoid  overfitting  by  fitting  free 
parameters  to  data 


Underspecified  Models 

f.V*  Conloroncfl  A FMpo  “ 


specify: 

non-determ  inistic 
explains  empirical  variance 
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Buffers  as  Iniarfaces  aJid 
& Imm  of  working  memory 
(a.g..  Goal  Relrieval  buffers) 


HPDSI 


Peroeplual^otofMc 

Modules 


isp  IRrmstioWfe^emory 
(if'ihen  rules) 


Declarative  Memory 
(storage  ar>d  netneval  of  chunks) 
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MOEISIM  WORLD 

& E»po 


ACT-UP 


• A stand-alone  system  on  the  basis  of  Common 
Lisp 

• targets  an  audience  that  can  write  simple  Lisp 
programs  (unlike,  e.g.,  CogTool) 

• Toolbox  approach  to  ACT-R 

- light-weight:  it’s  a Lisp  library 

- does  not  produce  production  rules  (ACT- 
R/Lisa,  ACT-Simple,  CogTool) 

• Not  aimed  at  implementing  all  constraints  of 
ACT-R  6 (unlike  Java  ACT-R,  Python  ACT-R) 


WORLD 

[ Conlorencfl  & pJipo 


Declarative 

Memory 

'define-chunk-type’ 

- types  are  optional 

'make-count-order’ 

’learn-chunk’ 

'defrule’ 

'retrieve-chunk’ 

'count-order-second 


; ACT-R  Dorometers 
nf*  .0S) 

-rt*  -1) 

: - ; chtLnk  type 


ACT-UP  Code 


;; define- chunk coiint-order  first  second) 


578 


MOEIJIM  WORLD 

C'Ortiertrtco  & E»po 


ACT-UP  is  notACT-R6... 

• ACT-UP  Interface  is  synchronous 

- Serial  execution 

- Deterministic  strategies  defined  as  programs 

• Parallelism  (e.g.,  perceptual/motor  modules) 
possible  [not  implemented] 

• Non-deterministic  rule  choice  is  possible 

- Reinforcement-learning  as  in  ACT-R  6 


nmodsim  world 

^ Conloroncfl  & fjtpo 


PM  / utility  learning 


'choose-coin' 


calls  either  'decide- 
heads 

or  ’decide-tails’ 

'assign-reward' 
reinforces 
the  decision 

Exact  production  rules 
are  underspecified 
- but  decision-making 
point  is  explicit 
Choice  model 
replicates  ACT-R  and 
empirical  results 


, ; Experimental  environment 
C defun  toss -coin  Q 

(if  C<  (random  1.0)  .9)  'heads  * tails)) 

- The  Model 

;;;;  Rules  that  return  the  choice  as  symbol  heads  or  tails 


Cdefrule  d^cide-taUs  Q 
: group  choose-coin 
tails) 

Cdefrule  decide -heads  Q 
: group  choose-coin 
heads) 
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Debugging 


(defrule  count -mode  I orgZ) 

Toiint  from  ARGl  to  ARGl . 

ARGl  is  the  starting  point  and  ARGZ  is  the  ending  point. 

Each  increment  is  I unit*" 

Cspeok  argl) 

(if  (not  (eg  argl  orgZ)) 

(let  CCp  Cdebuig-detail  (retrieve-chunk  (list  :chunk-type  ‘count -order 

: first  argl))))) 

Cif  P 

(count-model  (count-order-second  p)  argZ))) 

argZ)) 


Conlerencc  & Enpo  Debugging 


CL  US£R>  Cdebug-detavl 

make' witch- chunk  CflnRe-TTPE*’):  Ho  such  chunk  in  dh,  fieturnins  *1®**  chunk  fnot  in  DM)  of  now  LOSE 
Presentation  of  chunk  LOSE  NIL  t-7£?61.2fi.  H;  I4IDELS214SS.  t-fi, 

Implicitly  eroatiru|  chuok  of  name  LOST. 

Presentation  of  chunk  LOST  <NP:  MIL  1*72761.26.  M;  MQ0EL521436.  t-ft. 
rmpUcltly  creating  chunk  of  ixme  BLANK. 

Pressntation  of  chunk  BLAMK  (W>;  NIL  wi761.3«.  M;  H0KL521436*  t-7276l.3flS. 

mke-ffitch- chunk  (moke- TYPE*}:  No  luch  chunk  in  DM.  Roturning  n«f  chunk  Coot  in  DM)  of  nciw  HAVE 

Pr«£Bntatipn  of  chunk  HAVE  C«P:  MIL  t-72761.445  H:  i«DELS214a£,  t-72761.445 

Implicitly  creating  chunk  of  name  HAD. 

Preientdtlon  of  chunk  HAD  NIL  t-7276l.44S.  HDOELSZlA^B,  t-72761.44S. 

Grwjo  PAST- TENSE -H00€L  with  matching  rules,  choosing  rule  PTM3DEL  (Lltllity  5.07*99%) 

Groua  FOflH-PAST-TtHS£  with  3^fl  tnatching  rules,  choosing  rule  STRATEGY -WRlClfT-ANAlOGY  COtillty  5.225957) 
rut  rlBnrfO- chunk; 

Saec:  CCHUNK-IYPE  PA5TTENSE  VERB  GfT) 
cues.  MIL 
pirate  MIL 

filtered  ft  moitching  chunks. 

rptripvpfl  rinnp  our  nf  ft  nvirrhing  chunks 

NIL 

Asst|nlng  reiwra  3.9 

Assigning  reward  3.SS31Z5  to  STRATEGY -NITHOlfr -ANALOGY.  STRATECY-mTH-AJtALOGY  remains  best  regular  rule  in  group  FOW- PAST- TENSE. 

Assigning  reward  0,ft  to  PTMOOEL.  Best  re^ulor  rule  onxig  olternotives  in  group  PAS7-TEMSE-HOOE1I 

NTi 

CL -USERS  I 
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Implemented  Models 

• 10  Classic  models  implemented; 

- count,  addition,  siegler,  zbrodoff,  paired,  fan,  sticks, 
semantic,  choice,  past-tense 


* past-tense  not  yet  complete 


MODSIM  WORLD 

Conloronco  A Fsipo 


Efficiency 

• Sentence  production  (syntactic  priming)  model 

- 30  productions  in  ACT-R,  720  lines  of  code 

- 82  lines  of  code  in  ACT-UP  (3  work-days) 

- ACT-R  6:  14  sentences/second 

- ACT-UP:  380  sentences/second 
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Scalability 

• Language  evolution  model 

- Simulates  domain  vocabulary  emergence 
(ICCM  2009,  JCSR  1010) 

- 40  production  rules  in  ACT-R  (could  not  prototype) 

- 8 participants  interacting  in  communities 

• In  larger  community  networks:  1000  agents,  84M  interactions 
(about  1 minute  sim.  time  each),  37  CPU  hours 


MOE^SIM  WORLD 

I Coriloronco  lb  ^ I f 

Rapid  prototyping/Reuse 

• Dynamic  Stocks&Flows  model  (JAGI2010) 

- Competition  entry,  model  written  in  < 1 person-month 

- Instance-based  learning  (IBL,  Gonzales&Lebiere 
2003) 

- Blending  (Wallach&Lebiere  2003) 

- free  parameters  (timing)  estimated  from  example 
data 

- Model  generalized  to  novel  conditions 
• (....NOT,  but  it  did  so  better  than  others.) 

• Same  IBL/biending  micro- strategy  was  re-used  directly 
in  a Lemonade  Stand  Game  entry  to  a 2009 
competition  (BRIMS  2010) 
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Drawbacks 

• Less  established  code-base  than  ACT-R  6 

• Lisp 

• Lack  of  architectural  timing  predictions  from 
rule  matching 

• Lack  of  parallelism  (planned:  fall  2010) 

• lack  of  perception/motor  modules 

- Will  be  available  in  ACT/Simple-style  interface 
(Salvucci&Lee  2003) 


WORLO 

f-V"  Oonlorsncfl  A 

Beta-Test 

• Limited  Reiease  of  ACT-UP  test  version 

- comes  with  10  example  models 

- 4 tutorials  (paralleling  the  ACT-R  6 ones) 

- Full  API  documentation  plus  How-do-l...  document 

• Testing  period:  Fall  2010 

• Task:  implement  1-2  models  of  your  own 

• Review  letter  requested  (journal-review  style) 
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5.4  Examining  the  Relationships  Between  Education,  Social  Networks  and 
Democratic  Support  With  ABM 
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Abstract  This  paper  introduces  an  agent-based  model  that  explores  the  relationships  between  education,  social  networks,  and 
support  for  democratic  ideals.  This  study  examines  two  factors  that  affect  democratic  support,  education,  and  social  networks. 
Current  theory  concerning  these  two  variables  suggests  that  positive  relationships  exist  between  education  and  democratic  support 
and  between  social  networks  and  the  spread  of  ideas.  The  model  contains  multiple  variables  of  democratic  support,  two  of  which 
are  evaluated  through  experimentation.  The  model  allows  individual  entities  within  the  system  to  make  ^‘decisions”  about  their 
democratic  support  independent  of  one  another.  The  agent-based  approach  also  allows  entities  to  utilize  their  social  networks  to 
spread  ideas.  Current  theory  supports  experimentation  results.  In  addition,  these  results  show  the  model  is  capable  of  reproducing 
real  world  outcomes  This  paper  addresses  the  model  creation  process  and  the  experimentation  procedure,  as  well  as  future 
research  avenues  and  potential  shortcomings  of  the  model. 

1.  Introduction 

How  do  Democracies  arise?  It  is  not 
possible  to  answer  this  question  in  a simple 
or  quick  manner.  Democracies  tend  to  take 
decades  to  fully  form,  so  studying  the 
variables  that  lead  to  their  rise  is  not  a task 
one  can  achieve  in  a week.  The  process 
requires  extensive  examination  of  literature 
and  history.  However,  the  study  of  the 
relationships  between  variables  that  affect 
democracy  can  occur  in  a much  shorter 
period  using  Agent  Based  Modeling  (ABM). 

In  order  to  conduct  initial  experimentation 
with  this  new  method,  this  study  only 
examines  the  effects  of  a few  variables  on 
the  rise  of  democracy;  the  main  variable  we 
examined  being  democracy,  with  social 
networks  serving  as  a medium  for  ideas  to 
spread,  allowing  us  to  examine  the  effects 
of  social  networks  on  idea  transference. 

Based  on  a study  of  the  relevant  literature, 
the  hypothesis  for  this  experiment  is; 
increases  in  the  education  transfer  variable 
will  lead  to  an  increased  number  of 
democratic  supporters,  over  a 100  step 


(year)  cycle. 

2.  Literature  Review 

A review  of  the  current  literature  on 
democracy  revealed  that  democratic  ideals 
influenced  by  education  positively  affect 
support  of  democracy  [13],  [15],  [8],  [5],  [4], 
Democratic  states  and  states  transitioning 
to  democracy  often  have  strong  liberal 
education  systems.  These  systems  help 
pass  on  the  basis  of  democratic  ideals  to 
every  generation,  resulting  in  a population 
that  approves  of  and  supports  democracy 

[7] ,  [4].  Where  these  strong  liberal 
education  systems  are  lacking,  states  often 
experience  lower  levels  of  support  for 
democracy  [6]. 

Much  of  the  literature  focuses  on  the  fact 
that  creating  a culture  of  democracy  is 
important  to  improving  democratic  support. 
This  tie  is  into  the  concept  of  democratic 
ideals,  or  a system  of  beliefs,  which  match 
with  a democratic  form  of  government  [5], 

[8] .  Consequently,  where  these  ideals  are 
less  or  completely  absent,  one  would  expect 
democracy  to  be  non-existent,  or  the 
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system  severely  flawed.  Therefore,  the 
current  theory  concludes  that  support  for 
democracy,  on  a national  level,  is 
contingent  on  a system  of  democratic 
ideals,  which  the  populace  receives  during 
the  education  process. 

In  addition  to  education,  the  literature 
highlights  the  relationships  between 
democracy  and  other  cultural  and  economic 
influences;  with  economic  variables  being  a 
common  theme  in  most  of  the  literature. 
While  this  variable  appears  to  influence 
democratic  ideals,  in  conjuncture  with 
education,  there  is  no  clear  connection  for 
how  the  two  relate  to  each  other.  While  it 
would  seem  safe  to  assume  that 
economically  secure  individuals  would 
receive  better  educations,  none  of  the 
literature  clearly  states  this.  As  a result,  for 
the  purpose  of  this  study,  economic  factors 
will  remain  locked,  and  consist  only  of  a 
random  distribution  of  wealth  among 
agents. 

An  additional  variable,  which  the  literature 
identifies  as  important  to  the  spread  of 
ideas,  is  social  networks.  Current  literature 
highlights  several  ways  that  an  individual’s 
social  network  can  influence  them.  Many 
individuals  find  themselves  in  situations 
surrounded  by  others  that  share  the  same 
democratic  ideals  as  they  do;  but  they  also 
find  themselves  surrounded  by  individuals 
that  have  different  democratic  ideals  [18], 
The  likelihood  that  an  individual  will  accept 
another’s  democratic  beliefs  is  based  on 
how  strong  their  current  beliefs  are,  as  well 
as  the  amount  of  effort  the  other  person 
expends  trying  to  instill  their  democratic 
ideals.  For  these  reasons,  the  literature 
clams  there  is  no  more  than  a fifty  percent 
chance  that  an  individual  will  accept 
influence  from  either  side  [18], 

As  opposed  to  the  immediate  influence 
that  individuals  receive  from  others  within  a 
particular  propinquity,  current  literature  also 
discusses  how  individuals  accept  influence 
from  their  friends  and  family.  Unlike 
influences  applied  by  individuals  in  a 
person’s  proximity,  a person  can  choose  the 
friends  and  family  from  whom  they  are 
willing  to  accept  influence.  Recent  literature 


argues  that  during  the  current  era  people 
are  not  limited  to  only  accepting  influence 
from  friends  and  family  that  hold  the  same 
beliefs.  In  fact,  since  democratic  ideals  can 
change  at  a rapid  rate,  individuals  are 
willing  to  accept  influence  from  those  with 
the  same  ideals,  as  well  as  those  with 
differing  views  [16],  [17], 

Because  of  the  above-mentioned 
literature,  this  study  not  only  looked  at 
individuals  within  a person’s  immediate 
proximity,  but  also  the  individuals  that  are 
involved  within  a person’s  far-reaching 
network.  Additionally,  this  study  did  not  limit 
individuals  to  accepting  influence  from  those 
who  share  the  same  democratic  ideals,  but 
allowed  for  individuals  with  differing 
democratic  ideals  to  influence  a person. 

3.  Methodology 

To  create  the  model  for  this  study  we 
followed  a two-step  process:  create  a 
metacode,  and  then  input  the  true  code  into 
NetLogo.  NetLogo  is  an  agent  based 
modeling  environment  developed  on  the 
Java  platform.  The  software  allows  users  to 
create  and  program  agents,  giving  them 
sets  of  instructions  for  interactions  and 
behaviors.  The  user  can  then  create 
interactions  amongst  agents,  and 
experiment  with  the  interactions  to 
determine  how  changes  in  individual 
behavior  affect  the  overall  behavior  of  the 
model.  Metacode  is  a rough  outline  of  the 
intended  process  for  a program,  in  this  case 
an  agent-based  model,  and  represents  a 
high-level  view  of  how  the  model  will 
function. 

After  we  created  the  metacode.  we  began 
to  write  the  program  in  NetLogo.  While 
transferring  the  metacode  into  true  code  we 
often  found  problems  that  required  us  to 
add  modules  to  the  main  program  and  in 
some  cases  change  some  of  the  basic 
processes.  Figure  1 shows  the  final  model 
format,  in  NetLogo,  with  the  added  variables 
and  the  final  variable  control  formats. 

Agents  within  the  model  follow  a set  of 
procedures  to  perform  the  following  actions 
during  each  "step”:  they  decide  whether  to 


585 


educate  or  not;  they  receive  education  (if 
they  are  at  a location);  they  interact  with 
their  social  network  and  local  community; 
they  decide  whether  to  become  a supporter, 
detractor  or  remain  neutral;  they  perform 
actions  to  possibly  give  birth  or  die;  and 
they  move.  To  examine  the  main  variable, 
education,  agents  within  the  system  perform 
an  initial  check  to  determine  two  things:  are 
they  close  enough  to  an  education  location 
to  attend  and  are  they  the  right  age  to 
attend.  The  radius  in  which  agents  must  be 
to  attend  a location  is  determined  using  a 
slider  (education-influence-radius),  which 
we  did  not  adjust  for  this  experiment  due  to 
computer  processing  constraints.  Along 
with  the  number  of  locations  available  for 
agents  to  receive  education,  we  felt  that 
increasing  or  decreasing  these  variables 
would  result  in  predictable  outcomes 
(agents  would  be  more  likely  to  support 
democracy  where  radius  and  education 
locations  were  high  and  vice  versa).  The 
important  variable  we  did  allow  to  change  in 
order  to  examine  the  effects  of  education 
was  the  level  of  education  agents  received 
at  the  education  locations.  Agents  who 
attended  a location  receive  X amount  of 
"education”  each  year  until  they  reach  the 
age  of  1 8.  Agents  in  the  model  do  not  have 
to  go  to  an  education  center  unless  they  are 
within  the  variable  range  determined  by  the 
education-influence-radius.  Therefore, 
agents  who  "live"  away  from  education 
centers  (those  further  away  from  a center 
than  the  value  of  the  education-influence- 
range  variable)  would  not  receive  education, 
while  those  close  by  would.  In  addition, 
agents  could  receive  education  anytime 
after  the  age  of  five,  until  they  were  18. 
Therefore,  agents  not  encountering  a 
location  when  their  age  reached  the 
minimum  could  still  enter  a location  later. 

At  the  beginning  of  each  run  of  the  model, 
agents  look  within  a certain  radius,  as  well 
as  looking  to  a certain  number  of  other 
agents  in  their  extended  social  network,  to 
receive  influence  (support  or  detract).  The 
model  contains  a multitude  of  options  for 
adjusting  agent’s  social  networks.  The 
model  allows  for  the  selection  of  the 


immediate  radius  from  which  each  agent  will 
look  to  for  support  influence.  As  the  range 
of  the  social  network  increases,  the  agents 
will  have  more  companions  from  which  they 
can  draw  either  positive  or  negative 
democratic  support.  Within  this  process,  we 
built  in  a measure  of  randomness  by 
ensuring  the  distribution  of  agents  would 
result  in  different  numbers  of  neighbors  in 
each  individual’s  range.  After  each  step  in 
the  model,  the  agents  move  a couple  of 
spaces  in  different  directions;  this  allows 
agents  to  move  in  and  out  of  the  influence 
radius  of  others. 

As  for  the  extended  social  networks  of  the 
agents,  or  more  simply  a network  that  is  not 
limited  to  a certain  radius,  there  are  also 
options  that  allow  the  user  to  manipulate  the 
model.  First,  the  user  has  the  option  to 
choose  in  which  type  of  extended  network 
the  agents  will  participate.  The  three 
options  are  normal,  uniform,  or  constant 
distributions.  The  normal  distributions 
assign  each  individual  a number  of  agents 
to  participate  in  their  extended  network 
using  the  normal  distribution  to  determine 
the  exact  number.  The  uniform  option  uses 
a simple  random  procedure  to  determine 
the  number  of  agents  within  a certain 
individuals  extended  network.  The  uniform 
distribution  does  not  follow  the  bell  curve 
but  allows  every  number  in  the  random 
number  range  to  have  an  equal  opportunity 
of  being  selected,  resulting  in  random 
numbers  of  agents  in  each  network.  Lastly, 
the  constant  distribution  gives  all  agents  the 
same  number  of  individuals  within  their 
extended  network. 

During  every  time  step  of  the  model, 
agents  look  within  their  social  network, 
which  includes  their  immediate  radius  and 
extended  social  network,  and  determine 
agents  from  which  they  will  accept  either 
democratic  support  or  non-democratic 
support.  In  order  to  do  this,  the  model  is 
designed  to  follow  the  assumptions 
described  in  the  literature,  and  a coin-flip 
procedure  determines  whether  the  agents 
accept  influence  (i.e.  agents  have  a 50/50 
chance  of  accepting  or  rejecting  influence). 
This  works  the  same  for  both  democratic 
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influence  and  non-democratic  influence.  If 
an  agent  accepts  influence  the  amount  they 
accept,  which  remains  constant,  is  either 
added  or  subtracted  from  the  democratic 
support  variable.  Agents  receive  greater 
influence  from  their  non-immediate  network 
{representative  of  their  family  and  friends) 
than  from  their  neighborhood.  We  made 
this  decision  because  influences  an 
individual  receives  from  family  and  friends 
tend  to  be  stronger  than  influences  they 
accept  from  strangers.  If  agents  accept 
democratic  influence,  the  support  variable 
increases,  but  if  they  accept  non-democratic 
influence,  the  support  variable  decreases. 

In  addition  to  the  two  main  variables  we 
examined,  agents  also  performed  checks  to 
gain  or  lose  wealth  and  to  decide  whether  to 
support  democracy  or  not.  Because 
economics  was  not  a focus  of  our 
experiment  or  our  hypothesis,  but  is  an 
important  variable  in  democratic  support,  we 
included  a procedure  to  allow  agents  to  gain 
or  lose  wealth.  For  simplicity  sake  we  used 
very  basic  procedures  to  allow  agents  to 
gain  or  lose  wealth;  if  an  agent  is  in  the 
upper  15%  of  the  population  in  wealth  they 
have  a greater  chance  to  gain  more  wealth, 
while  those  below  the  50%  median  have  an 
equal  chance  to  loose  or  gain  wealth.  We 
felt  this  procedure  was  necessary  to 
represent  the  fact  that  individuals  with  large 
amounts  of  wealth  are  better  able  to  protect 
their  wealth  and  may  be  able  to  continue  to 
gain  it,  while  those  with  less  wealth  have  a 
harder  time  protecting  and  gaining  wealth. 

Agents  follow  a procedure  of  checking 
their  wealth,  education  and  support  levels  to 
determine  if  they  will  became  a supporter  or 
detractor.  We  set  thresholds  for  these 
variables  (for  support  and  wealth  they  did 
not  change)  and  agents  check  all  three, 
deciding  to  be  a detractor  if  they  fell  below 
all  three- threshold  levels,  and  deciding  to  be 
a supporter  if  they  were  above.  As 
explained  previously,  agents  accept  support 
or  non-support  from  their  neighbors  and 
social  network.  We  included  this  variable 
and  allowed  it  to  shift  in  order  to  provide  a 
way  to  examine  the  effects  of  social 
networks,  and  to  allow  agents  to  decide 


whether  to  support  democracy  based  on 
variables  other  than  just  wealth  or 
education.  Because  it  is  not  realistic  to 
assume  that  all  educated  and  wealthy 
people  will  automatically  support 
democracy,  we  included  the  democratic 
support  variable  to  allow  agents  to  decide 
not  to  become  supporters,  even  if  they  were 
wealthy  and  educated. 


4.  Results 

In  order  to  experiment  with  this  model  we 
ran  12,961  trials  using  a variety  of  variable 
settings.  Utilizing  the  behavior  space 
feature  within  NetLogo,  we  were  able  to 
sample  six  variables  across  multiple 
settings.  For  several  variables  (Popuiace- 
education,  Democratic-educated,  and 
Democratic-uneducated)  we  did  not  sample 
the  entire  variable  range  due  to  time 
constraints.  In  addition,  we  did  not  include 
the  remaining  sliders  and  switches 
(education-location,  death  rate,  birth  rate, 
and  network  distribution)  in  this  experiment 
because  we  did  not  wish  to  test  their  direct 
effects  on  democracy. 

Results  of  our  experiment  showed  that 
overwhelmingly  democratic  supporters 
outnumbered  democratic  detractors  (91%  of 
the  time). 


Supporters 

outnumber 

Detractors 

Detractors 

outnumber 

Supporters 

Total 
number 
of  runs 

Runs 

11J95 

1,166 

12,961 

Table  1:  Total  times  each  group 
outnumbered  the  other 
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In  addition,  the  average  percent  of  agents 
in  the  system  that  were  supporters  was 
33%,  while  only  5%  of  the  agents  were 
detractors. 


Average 
number  of 
Supporters 

Average 
number  of 
Detractors 

Total 
number 
of  agents 

90 

15 

300 

Table  2:  Average  results  of  a run 


Across  multiple  runs,  this  demonstrates 
that  in  almost  all  instances  democratic 
support  arose  within  the  model,  across 
multiple  variable  settings.  However,  the  9% 
of  runs,  which  resulted  in  democratic 
detractors  being  the  majority,  demonstrated 
that  variable  settings  did  affect  the  outcome. 

In  order  to  gain  a better  understanding  of 
what  our  results  showed,  we  constructed  a 
linear  regression  model  of  the  results, 
finding  that  all  variables  except  network 
density  had  a significant  effect  on 
democratic  supporters  {results  appear  in 
appendix  1).  Our  regression  model 
included  all  variables  from  the  model  that 
changed  (Populace-education,  Democratic- 
educated,  and  Democratic-uneducated, 
Network-density,  and  local-community). 

The  resultant  adjusted  R-square  value  of 
0.544  shows  that  this  model  was  robust  and 
captured  a large  portion  of  the  variability 
within  the  model.  In  addition,  the  high  F 
value  (2581 .219)  shows  that  the  model  was 
significant. 

The  regression  coefficients  from  this 
model  showed  that  almost  all  variables  had 
the  expected  relationships  with  our 
dependent  variable  (based  on  the  literature 
review).  One  variable  which  did  not 
demonstrate  an  expected  relationship  with 
democratic  support  was  the  detractor 
threshold  variable.  The  regression  model 
showed  that  as  the  threshold  to  become  a 
detractor  rose,  democratic  supporters  in  the 
model  fell.  While  this  result  appears 
counterintuitive,  the  regression  analysis 
does  not  take  into  account  the  overall 
number  of  decorators  and  supporters  in  the 
model  (i.e.  even  as  decorators  within  the 
system  fall  due  to  a higher  threshold. 


democratic  supporters  in  the  system  do  not 
necessarily  rise).  This  result  demonstrates 
that  the  two  groups,  detractors  and 
supporters,  do  not  vary  based  on  each 
other’s  numbers.  This  finding  supports  our 
belief  that  the  model  adequately  captures 
real  world  behaviors.  Had  the  regression 
analysis  shown  that  these  variables  had 
opposite  relationships  with  the  dependent 
variable,  it  would  suggest  that  they  might 
have  an  effect  on  each  other  as  well.  For 
this  model  to  be  accurate,  the  number  of 
democratic  supporters  or  detractors  should 
not  influence  the  other  beyond  moderately 
affecting  the  size  of  the  influential 
population  pool.  This  result  is  in  no  way 
conclusive  that  the  two  variables  are  not 
connected,  but  it  does  indicate  their  limited 
connectivity,  which  implies  the  number  of 
supporters  and  detractors  within  the  model 
is  mostly  able  to  vary  independent  of  one 
another. 

The  other  variables  within  our  model- 
demonstrated  relationships  that  the 
literature  suggests  should  exist.  All  three 
remaining  variables  that  were  significant 
had  positive  relationships  with  democratic 
supporter  numbers.  While  one  of  the 
variables  we  focused  on  (education)  had  a 
positive  significant  relationship  with 
democratic  supporters,  the  variables 
relating  to  social  networks  were  not  both 
significant.  The  variable  representing  the 
agent’s  social  network  external  to  their 
location  (i.e.  those  they  agents  not  in  direct 
or  near  direct  contact  with)  was  not 
significant.  However,  because  agents  were 
able  to  move  within  the  system,  this  is  likely 
the  cause  for  the  local  community  (agents  in 
direct  or  near  direct  contact  with  each  other) 
having  an  effect  on  the  outcome.  The  limit 
of  social  networks  in  this  model  is  that  they 
do  not  expand  as  agents  encounter  each 
other;  since  the  social  network  is  not  able  to 
expand  throughout  the  agents  “life”,  it  has  a 
fixed  effect  on  the  outcomes,  which  appears 
to  be  insignificant.  In  order  to  verify  this 
finding  we  would  need  to  re-run  this 
experiment  and  allow  the  agents  network 
density  to  vary  across  several  distributions 
to  determine  if  the  effect  is  fixed  or  not. 
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5.  Conclusion 

The  results  of  this  model,  across  multiple 
variable  settings,  indicate  that  the  model 
agrees  with  current  theory.  The  fact  that 
democratic  supporters  did  not  always 
outnumber  detractors  also  indicates  that  the 
outcomes  are  not  hard-coded  into  the 
model.  While  the  sensitivity  does  appear 
somewhat  low  {as  demonstrated  by  the  fact 
that  democratic  supporters  outnumbered 
detractors  91%  of  the  time),  the  model  did 
not  produce  an  overwhelming  majority  of 
one  outcome  or  the  other.  If  we  had  added 
other  variables  or  added  additional  variable 
settings  in  the  experiment,  it  is  likely  that  the 
results  would  likewise  have  varied.  This  is 
especially  true  for  the  number  of  education 
locations.  In  order  to  further  test  the  effect 
of  education  on  democratic  support  we 
would  allow  the  number  of  locations  to  vary, 
and  examine  how  this  affects  the  model’s 
results. 

The  model  also  captured  the  relationship 
between  the  spread  of  democratic  ideals 
and  social  networks.  In  addition  to  the 
effects  education  had  on  democratic 
support,  the  relationship  between  individual 
agents  and  their  immediate  community 
reveal  that  external  influences  also  play  a 
large  role  in  determining  an  individual’s 
views  of  democracy.  However,  the  model 
does  seem  to  represent  a very  small 
majority  of  real  world  situations.  The 
inclusion  of  social  networks,  which  reach 
across  distances  greater  than  an  immediate 
“neighborhood”,  is  more  representative  of  a 
country  with  advanced  telecommunication 
networks.  Because  countries  do  not  all 
possess  advanced  communication  networks 
allowing  all  their  citizens  to  communicate 
with  friends  and  family  over  vast  distances 
instantaneously,  the  model  is  not 
representative  of  all  possible  states. 
However,  we  could  replicate  countries 
without  these  advanced  communications 
networks  by  removing  the  network  density 
variable  from  experimentation. 

While  this  model  is  not  representative  of 
all  countries,  which  will  take  further 
experimentation  and  testing  to  correct,  it 


does  match  well  with  the  current  state  of  the 
world.  In  countries  with  advanced 
communication  networks  and  good 
education  systems,  the  predominant  form  of 
government  is  democracy.  Our  model 
adequately  reflects  this,  demonstrating  that 
the  model  is  relatively  accurate,  in  terms  of 
recreating  real  world  situations.  We  expect 
that  removing  social  networks  and  running 
the  experiment  again  would  likewise  affect 
the  model  and  produce  results  more 
representative  of  countries  without 
advanced  telecommunication  networks. 

Another  possibly  inaccurate  aspect  of  the 
model  is  the  number  of  education  locations 
we  allowed  to  exist.  For  the  purpose  of  this 
experiment,  we  decided  to  vary  the  level  of 
education  agents  received  and  not  the 
number  of  sources  where  they  could  receive 
their  education.  We  expect  that  lowering  or 
raising  the  number  of  locations  will  have  the 
same  impact  on  the  number  of  democratic 
supporters.  Based  on  the  construction  of 
the  model,  a high  number  of  locations  will 
inherently  affect  more  agents  and  introduce 
more  education  into  the  model.  We 
therefore  decided  to  remove  this  variable 
from  this  experiment  as  we  expected  its 
impact  would  be  too  great  on  the  outcomes 
of  our  experiment.  In  future  tests,  we  would 
include  this  variable  and  examine  how  it 
affects  the  models  results.  Should  it 
produce  results  differing  from  what  we 
expect,  it  would  provide  interesting  insight 
into  how  the  number  of  education  locations 
available  to  individuals  may  positively  or 
adversely  affect  their  education. 

In  terms  of  validation,  this  model  appears 
to  be  a valid  representation  of  the  real 
world.  However,  we  could  not  identify  a real 
world  case  to  compare  our  results  too,  in 
furtherance  of  this  conclusion.  In  future 
validation  procedures  we  plan  to  empirically 
test  this  model  against  real  world  cases  of 
countries  where  democratic  support  is  the 
majority  opinion  of  the  people,  and 
somewhere  it  is  not.  If  through  further 
validation  our  model  proves  to  be  an 
accurate  representation  of  real  world 
situations  then  our  results  would  further 
reinforce  current  theory  concerning 
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education  and  democracy.  The  results 
would  also  support  the  notion  that  social 
influences  are  an  important  factor  in 
determining  an  individual’s  support  of 
democracy.  Our  results  from  this 
experiment  support  this  notion  and  suggest 
that  individuals  are  heavily  influenced  both 
by  the  people  they  encounter  in  contact 
with,  and  by  the  education  they  receive. 

6.  Appendix 

6.1  Regression  Tables 

Democratic  Supporters  Regression  Model 
Tables 
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Education,  Social  Networks 
and  Democratic  Support 


Nick  Drucker 
Kenyth  Campbell 
MODSIM  WORLD  2010 


Purpose 

• Examining  how  we  can  leverage  modeling 
and  simulation  techniques  in  the 
International  Relations  field 

• We  were  interested  in  the  emergence  of 
democracy  supporters  and  detractors  from 
a neutral  population 
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Research  Question 


What  are  the  effects  of  education  on  the 
emergence  of  democratic  supporters  in 
a socially  connected  country? 


Agent  Based  Modeling 

• Agents  make  decisions  based  on  a set  of 
rules 

• In  the  Democracy  Emergence  model,  agents 
were: 

- Supporters 

- Detractors 

- Neutral 

• Supporters  and  Detractors  influence  the 
neutral  population,  who  are  eligible  to  receive 
education  under  certain  circumstances 


593 


Model  Logic 

• Agents  make  decisions  based  on  three  variables 

- Education  is  determined  by  the  independent  variables  in  the 
model 

- Social  network  provides  direction  for  idea  transference 

- Wealth  is  uniformly  distributed  and  changes  in  a fixed  manner 

• Education  was  the  influential  variable  and  was 
determined  by: 

- Radius  of  education  centers  influence 

- Amount  of  education  transferred  from  a center  to  an  agent 

- Agents  initial  education 


Model  Logic 

• Thresholds  for  all  three  variables  must  be  met 
to  support  or  detract 

• Supporters  and  detractors  influence  their  social 
networks 

• 100  step  process  with  agents  dying  and  being 
born  continuously 
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Social  Network 


• Consists  of  two  parts 

- Local  radius  variable  (local-community) 

- Extended  network  variable  (network-density) 

• Each  of  these  variables  has  agents  look  to 
other  agents  for  potential  influence. 

- Local  radius  variable  has  agents  look  to 
neighbors  within  a certain  radius. 

- Extended  network  variable  has  agents  look  to 
friends/family  within  a social  network. 


Social  Network 

• Potential  Influence  is  accepted  via  a coin- 
flip  scenario. 

• Influence  received  from  other  agents  can 
either  be  positive  or  negative. 

• Influence  is  transformed  into  a support 
variable  that  Is  part  of  determining  whether 
agents  become  supporters  or  dissidents. 
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Assumptions  and  Expectations 


• Agents  exist  in  a electronically  connected 
environment  (e.g.  They  are  able  to 
communicate  instantaneously  with  their 
distributed  network) 

• Once  an  agent  is  a supporter  or  detractor 
they  can  no  longer  be  Influenced 

• There  is  a positive  relationship  between 
education  and  democracy;  this 
assumption  Is  based  on  current  theory 


Democracy  Emergence  Model 
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Experimentation 


• Set  parameters  for  independent  variables 
(upper  and  lower  limits) 

- Ex.  Education  influence  radius  (0-10) 

• Established  points  within  the  parameters 
to  sample 

- Ex.  Education  transference  variable  (2,  4,  6, 

8,10) 

• Ran  12,961  trials  with  different 
combinations 


Results 

• 91  % of  runs  resulted  in  a higher  level  of  support 
for  democracy  than  initial  conditions 

• On  average  35%  of  agents  in  the  system  were 
democratic  supporters;  5%  were  detractors 

• Linear  regression  analysis  revealed 
relationships  that  support  the  literature 

- Higher  education  transfer,  iower  education  support 
threshold,  and  higher  initial  education  resulted  in 
more  supporters 

- Lower  threshold  to  become  a detractor  did  not  result 
in  less  democratic  support  (unexpected) 
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Conclusions 


• We  were  successfully  able  to  apply  ABM 
principles  to  an  International  Relations 
topic 

• The  model  aligns  closely  with  literature 
- Appears  to  capture  real  world  situations 

• Democratic  support  was  not  the  only  end 
state  outcome,  non-democratic  support 
was  also  observed 


Future  Research 

• Apply  to  other  Socio-Cultural  issues 

• Applying  it  to  additional  variables  that 
impact  Democracy  and  its  emergence 
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Questions? 
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5.5  Connections,  Fallacies,  and  Potential  Directions 


CONNECTIONS,  FALLACIES,  AND 
POTENTIAL  DIRECTIONS 

Presented  by  Jon  Compton 


PROBLEM 

Quote  of  the  hour  (well,  the  next  20  minutes  anyway....) 


“All  models  are  wrong; 
some  are  useful.” 

- George  Box 
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PROBLEM 

Lies,  damned  lies,  and  statistics 


A study  published  in  the  American  Journal  of  Clinical  Nutrition  shows  that 
breastfed  infants  tested  5.2  IQ  points  higher  than  formula  fed  infants,  for  a 
comprehensive  study  involving  11  different  studies  and  over  7000  children. 


"Statistics  are  like  bikinis.  Wbat  they  reveal  is  suggestive,  but  what  they  conceal  is 
vital. " ^Aaron  Levenstein 


PROBLEM 

Forest  but  for  the  trees,  trees  but  for  the  leaves 


One  of  the  major  problems  faced  by  the  early  pyramid  builders  was  the  need 
to  move  huge  quantities  of  rock.  While  80  men  can  drag  a 2.5-ton  block  of 
stone  on  a sled,  as  depicted  in  carvings  in  some  later  Egyptian  tombs,  this 
brute-force  method  was  not  very  efficient. 


"Two  quite  opposite  qualities  equally  bias  our  minds  - habits  and  novelty  * - Jean  de  la 
Bruyere 

"When  knowledge  is  well  guarded,  its  easily  lost."  -Anon. 
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PROBLEM 

What  econometricians  know.... 


Without  doubt  the  most  widely  cited  review  is  the  classic  Meta-analysis  of 
research  on  the  relationship  of  class  size  and  achievement  (Glass  Smith, 
1978).  The  two  primary  conclusions  drawn  from  this  material  are: 

* reduced  class  size  can  be  expected  to  produce  increased  academic 
achievement 

* the  major  benefits  from  reduced  class  size  are  obtained  as  the  size  is 
reduced  below  20  pupils 


n never  let  schooling  interfere  with  my  education.“  -Mark  Twain. 


PROBLEM 

The  Terms  of  the  Discourse. . . . 


The  Climate  change  debate: 

Do  you  believe  in: 

Climate  Change? 

Global  Warming? 

Anthropogenic  Global  Warming? 

Catastrophic  Anthropogenic  Global  Warming? 

And  just  who  are  the  "flat  Earthers"  and  just  what  don’t  they  believe  in? 

'"The  most  perfidious  way  of  harming  a cause  consists  of  defending  it  deiiberatety  with 
fauity  arguments.""--  Friedrich  Nietzsche 
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PROBLEM 

Where  theory  and  culture  diverge 


Prior  to  the  development  of  consciousness, 
Julian  Jaynes  argues  humans  operated 
under  a previous  mentality  he  called  the 
bicameral  ('two-chambered')  mind,  In  the 
place  of  an  internal  dialogue,  bicameral 
people  experienced  auditory  hallucinations 
directing  their  actions,  similar  to  the 
command  hallucinations  experienced  by 
people  with  schizophrenia  today.  These 
hallucinations  were  interpreted  as  the  voices 
of  chiefs,  rulers,  or  the  gods. 


“The  weight  of  original  thought  in  it  is  so  great  that  it  makes  me  uneasy  for  the  author's 
well-being  . . . /'  -D.C.  Stove  on  Jayne’s  The  Origin  of  Consciousness 


PROBLEM 

The  Terms  of  the  Discourse 


What  is  the  definition  of  consciousness? 

• Abstract  thought? 

• Memory? 

• Language? 


“The  visionary  fies  to  himseif  the  Her  only  to  others.^  ^ Friedrich  Nietzsche 
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PROBLEM 


Even  supermen  must  amuse  themselves  (sometimes  at 

the  e>ipense  of  the  lives  of  others) 


CHUMP 

(Complort's  Heurislic  Ultracnepiderian  Macromanlic  Pantomancer) 


CONNECTIONS 

The  beauty  of  the  rationality  assumption  (a  tired 
tale  indeed) 


But  what  is  the  real  problem? 


Do  you  play  chess? 


The  irrationality  of  a thing  is  no  argument  against  its  existence,  rather  a condition  of  iL“  ^ 
Friedrich  Nietzsche 
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CONNECTIONS 


No,  it’s  the  theory,  stupid 


ASSUMPTIONS 

Nevet  A$sumc  What  You'U  Tiyino  to 
UmI£S$  Yoil'lf  TtriNO  to  PJtOVE  YOU'RI  A BONEMEAD. 


CASES 

Who  are  these  folks  and  what  do  they  have  in 
common? 


Okay,  this  one  is  cheating, Hiroko  Nagata 
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CASES 

Terrorism;  common  assumptions 


1 . Terrorists  are  rational  actors  {Richanjson) 

2.  Terrorism  is  largely  the  result  of  poverty  or  unequal  distributions  of  wealth 

(Crenshaw) 

3.  Terrorists  act  to  achieve  specific  political  goals  (Ricnarcison) 

4.  Terrorism  is  a response  to  external  pressures,  such  as  foreign  occupation 

(Pape) 

5.  Terrorism  occurs  for  nationalist  or  separatist  reasons  (Wiiiiams) 

6.  Terrorism  is  a result  of  religious  extremism  (lake  your  pick) 


"Fighting  terrorism  is  not  unlike  fighting  a deadly  cancer.  It  can't  be  treated  just  where  it's 
visible  - every  diseased  cell  tn  the  body  must  be  destroyed."  ^ David  Hackworth 


PROBLEMS  > CONNECTIONS  > CASES  > WHAT  NOW? 

What  are  the  problems  with  our  common 
assumptions? 


The  seven  problems  identified  by  Max  Abrahms 

1 . Terrorism  fails  to  achieve  the  stated  goal  almost  all  of  the  time 

2.  Terrorism  is  almost  never  used  as  a last  resort 

3.  Terrorist  organizations  almost  always  reject  compromises  despite 
significant  policy  concessions 

4.  Political  goals  of  terror  organizations  are.  without  exception,  protean 

5.  Terrorist  attacks  are  usually  anonymous 

6.  Competing  terror  groups  with  identical  or  highly  similar  goals  generally 
prefer  to  attack  each  other  than  any  other  target 

7.  Terror  groups  seldom  disband  despite  the  consistent  failure  of  the  tactic  to 
actually  accomplish  their  objectives 

"Everybody's  worried  about  stopping  terrorism.  Well,  there's  a really  easy  way:  stop 
participating  in  it."  --Noam  Chomsky 

"Noam  Chomsky  is  a dumbass"  ^Sebastian  Synclair 
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PROBLEMS  > CONNECTIONS  > CASES  > WHAT  NOW? 

What  are  the  commonalities? 


What  are  the  common  factors  in  these  groups? 

Identity  Entrepreneur 

Each  has  a founding  member  with  charismatic  appeal,  a perception  of 
injustice,  and  a... 

Commitment  to  Violence 

Each  group  has  a core  of  people  who  hold  a commitment  to  violence 
that  shapes  the  goals  and  philosophy  of  the  group  (not  the  other  way 
around).  This  core  attracts  people  who  seek... 

Social  Affiliation  and  Identification 

The  bulk  of  these  groups  are  filled  out  with  people  who  lack  the  strong 
commitment  to  violence,  but  self  identify  with  the  lifestyle,  philosophy, 
social  atmosphere,  and  so  forth,  of  the  group. 

“I  don't  worry  about  terrorism.  I was  married  for  two  years.”  -Sam  Kinison 


CASES 

Who  are  these  folks  and  what  do  they  have  in 
common? 
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PROBLEMS  > CONNECTIONS  > CASES  > WHAT  NOW? 

What  are  the  commonalities? 


What  are  the  common  factors  in  these  people? 

Traumatic  Experience  Leading  to  Anger 

Each  has  had  some  form  of  trauma  as  a child  which  spurred  deep- 
seated  resentment. 

Strong  Personal  Commitment  to  Violence 

Once  begun,  the  commitment  to  violence  against  their  chosen  victims 
over-rides  inhibitions. 

Objectification  of  Victims 

To  accomplish  the  over-ride  of  inhibition  to  violence,  victims  are 
objectified — held  apart — from  their  perception  of  humanity. 

"People  don1  know  me.  They  think  they  do,  but  they  don  V'  --Andrew  Cunanan 


PROBLEMS  > CONNECTIONS  > CASES  > WHAT  NOW? 

Wait,  What? 


9 

■ 

What  are  the  common  factors? 
Traumatic  Experience  Leading  to  Anger 
Strong  Personal  Commitment  to  Violence 
Objectification  of  Victims 

This  is  how  we  kill. 

But.., 

What  about  the  capacity  for  obsession? 
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Figuring  Out  Where  to  Look.... 

Brain  Hemispheres 


Characteristics: 
Narrow  Focus 
Linear 

Non-multi-tasking 

Decontextualizing 

Abstractive 

Specialized 


Characteristics: 

Broad  Focus 

Non-Linear 

Multi-tasking 

Contextuaiizing 

Synthesizing 

Generai 


Figuring  Out  Where  to  Look.... 

This  is  a TEST! 


Left  Brain  - Right  Brain  Conflict 


LfMik  at  th«  chart  and  say  tba  rwt  th#  word 

YELLOW  BLUE  ORANGE 
black  red  green 

PURPLE  YELLOW  RED 
orange  GREEN  BLACK 
BLUE  RED  PURPLE 
GREEN  BLUE  ORANGE 


Left  - Right  Conflict 

Your  right  brain  tries  to  say  the  coler  but 
your  left  brain  insists  on  raadlng  the  word 


According  to  lain  McGilchrist,  the  hemisphere's  of  the  brain  vie  for  dominance  in 
the  individuaL  and  that  struggle  manifests  itself  not  only  in  the  individual,  but  in 
society  at  large.  This  is  because  environmental  conditions  influence  which  side 
is  dominant,  and  there  are  feedback  effects. 
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Synthesis 

How  Do  the  Hemispheres  Communicate? 


Remember  Jaynes? 

!f  Bicameral  Mind  theory  is  correct,  the  development  of 
Complex  metaphor  in  language  enabled  modern 
consciousness. 


What  do  Linguists  think  of  that? 

The  Sapir-Whorf  Hypothesis 

Weak  hypothesis:  Structural  differences  between  language  systems  will,  in 
general,  be  paralleled  by  noniinguistic  cognitive  differences,  of  an 
unspecified  sort,  in  the  native  speakers  of  the  language* 

Strong  hypothesis:  The  structure  of  anyone's  native  language  strongly 
influences  or  fully  determines  the  worldview  he  will  acquire  as  he  learns  the 
language. 


" Language  forces  us  to  perceive  the  world  as  man  presents  it  to  us.”  -Julia  Penelope 


Synthesis 

Cracked  Pots.... 


If  these  observations  and  theories  are  correct; 

1 . There  should  be  a correlation  between  language  and  hemispheric 
dominance  in  the  brain. 

2.  Strong  left-brain  dominance  in  population  groups  should  correlate  with 
higher  likelihoods  of  violent  behavior,  rational  or  otherwise. 

3.  Permissive  or  non-permissive  environments  should  moderate  the  levels 
of  violence  originating  from  a particular  social  or  cultural  group, 

4.  These  hypotheses  should  be  testable. 

5.  We  should,  therefore,  be  able  to  identify  population  groups  with  higher 
likelihoods  of  violent  behavior,  and  be  able  to  apply  policy  decisions 
based  upon  the  degree  of  permissiveness  in  the  environment. 

"Follow  the  path  of  the  unsafe,  independent  thinker.  Expose  your  ideas  to  the  danger  of 
controversy  Speak  your  mind  and  fear  less  the  label  of  "crackpot"  than  the  stigma  of 
conformity."  -Thomas  John  V\fetson,  Sr, 

^'Arnold  has  had  his  spokesman  cail  me  a crackpot  That  was  a mistake."  -\A^rren  Beatty 
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of  Incidents 


Synthesis 

This  Pot  Has  Definitely  Cracked.... 


Herein  lies  the  end  of  the  nutty  part.  The  rest  is  boring. 

Thank  you  for  your  attention. 

Comments  are  always  welcome. 

If  you  want  to  see  the  rest,  I’m  happy  to  go  through  after 
the  session. 

“Quotation  is  a serviceable  substitute  for  wit."  -Oscar  Wilde 


CASES 

What  does  the  data  say? 


Incidents  with  Percapita  GDP,  GINI,  and  RPC 


Log  of  Percapita  GDP 
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Membership 


PROBLEMS  > CONNECTIONS  > CASES  > WHAT  NOW? 


Away  to  conceptualize...  The  power  of  metaphor 


Violent  Systems  Theory:  terror  groups  as 
living  organisms 

Entails: 

• Life  cycle 

• System  congruence 

• Negative  entropy 
Overriding  goai;  SURVIVAL 


Negative  Entropy 

Requited  lorVNSA 
sunnvaL  irip<uts  saugItE 
unlil  inpul  Is  drained. 


Cungriience 

SubsysEems  (3rd  system  leveE) 
must  runctnu  in  reirttorcinfg 
mamer,  Qplimizing 
cQordinatiDfi  and  inrermstion 
exchange. 


ETA  Life  Cycle 
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FARC  Life  Cycle 


ETA  and  Tnsnd  Line 


FARC 
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The  factors  mentioned  on  the  previous  slides  are  common  among  terrorist 
groups.  But  they  do  not  necessarily  extend  to: 

Organized  Crime 

Insurgency 

Militias 

Civil  Wars 

There  is  no  silver  bullet;  aggregation  breeds  inaccuracy. 

Non-state  violent  actors  are  defined  by  their  acts,  not  by  their  intentions. 


'^Historicslly,  terrorism  falls  in  a category  different  from  crimes  that  concern  a criminal  court 
judge.”  -Jurgen  Habermas 
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PROBLEMS  > CONNECTIONS  > CASES  > WHAT  NOW? 

Tools  {bet  you  don’t  know  who  this  is) 


"Urn  dem  Terrorismus  kein  weiteres  Terrain  zu  uberlassen,  ist  der  Staat  von  vornherein 
gezwungen,  feste,  von  ketner  Seite  uberschreitbare  Grenziinien  des  rechtsstaatlich 
Mbglichen  zu  ziehen/* 

In  its  effort  fo  leave  no  free  ground  for  Terrorism,  the  State  is,  from  the  outset,  firmly 
hindered  by  the  impenetrabie  barriers  of  its  tegai  system."  - (translation  by  mom) 


PROBLEMS  > CONNECTIONS  > CASES  > WHAT  NOW? 

Tools:  Group 


Descriptive: 

Self  Similarity 
Power  Laws 

Case  studies  based  upon  outputs,  not  intentions 
Fractal  Geometry? 

See  Example  Prisoner’s  Dilemma  Game  with  tit-for-tat  strategy 
Important  Characteristics  and  Take  Aways: 

Equilibrium  states  exist  (power  laws) 

Networks  are  bounded  and  constrained  (self  similarity) 

But  remember:  DESCRIPTIVE  DESCRIPTIVE  DESCRIPTIVE 

'My  master  had  power  and  law  on  his  side;  1 had  a determined  will."  -Harriet  Ann  Jacobs 
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PROBLEMS  > CONNECTIONS  > CASES  > WHAT  NOW? 

Tools:  Individual 


Psychological 
Charismatic 
Well  delimited  Identity 
Strong  perception  of  injustice 
Depression  Study 

Depression  linked  to  problem  solving  portion  of  brain 
Excessive  problem  obsessions  linked  to  depressive  feelings 
Decisions/Actions  relieve  depression  feelings 
Cycle  works  In  both  directions 

The  biighl  side  of  being  blue:  depression  as  m adaplatien  for  analyzing  coitiplex  problems.  Andrews  and  Thonnson 

But  remember:  DESCRIPTIVE  DESCRIPTIVE  DESCRIPTIVE 

'Man  is  the  only  animal  for  whom  his  own  existence  is  a problem  which  he  has  to  solve" 
-Erich  Fromm 


PROBLEMS  > CONNECTIONS  > CASES  > WHAT  NOW? 

Tools:  How  do  we  think  about  the  problem? 


“Step  Back  Thinking” 

Move  backward  from  the  intractable  problems  until  the  issues  begin  to  fall 
within  the  capacity  to  act. 

Examples: 

1 . We  don't  have  to  worry  about  predicting  whether  or  not  a sub-national 
group  will  use  an  NBC  weapon  if  they  can't  get  one 

2.  We  don't  have  to  predict  the  supply  routes  or  behaviors  of  drug  runners  if 
there  is  no  demand  for  the  product 

3.  I don't  have  to  predict  where  traffic  is  likely  to  be  difficult  if  I telecommute 


"When  it  is  obvious  that  the  goals  cannot  be  reached,  don't  adjust  the  goals,  adjust  the 
action  steps Z*  - Confucius 
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PROBLEMS  > CONNECTIONS  > CASES  > WHAT  NOW? 

Tools:  How  do  we  crack  the  nut? 


1 . Ruthlessly  track  and  find  our  own  vulnerabilities 

2.  Catalog,  don't  predict 

3.  Requires: 

1 . Discover  Identity  Entrepreneur  as  early  as  possible 

2.  Ascertain  network  structure 

3.  Identify  strategies  that 

1 . Attack  IE 

2.  Disrupt  congruence  of  network 

3.  Shape  environment 


"Our  limitations  and  success  will  be  based,  most  often,  on  your  own  expectations  for 
ourselves.  What  the  mind  dwells  upon,  the  body  acts  upon.“  ^Denis  Waitley 


To  Predict  or  Not  to  Predict: 

The  Art  of  Prediction  and  the  Power  of  Games 


“No  one  could  predict  9/11.” 


""Life  imitates  art  far  more  than  art  imitates  lifeT  ^ Oscar  Wilde 
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PROBLEMS  > CONNECTIONS  > CASES  > WHAT  NOW? 

Tools:  Wargaming? 


Catalog,  don’t  predict  means: 

1 . Large  n wargames  {MS  Flight  Simulator) 

2.  Virtual  world  or  virtual  reality 

3.  Environmental,  self  adjudication 

4.  Catalog  strong  outliers  (learn  vulnerabilities,  ours  and  theirs) 

Second  Life  with  guns! 

1.  Real  worid  analogs 

2.  Real  world  consequences 

3.  Put  skin  in  the  game! 

If  some  unemployed  punk  in  New  Jersey,  can  get  a cassette  tc  make  love  to  E lie 
McPherson  for  $19,95,  this  virtual  reality  stuff  is  going  to  make  crack  look  like  Sanka,” 
-'Dennis  Miller 


Thank  you  for  your  attention. 
Comments  welcome. 


"Quotation  is  a serviceable  substitute  for  wit.“  ^Oscar  Wilde 
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5.6  Design  and  Evaluation  of  a Cross-Cultural  Training  System 


Design  and  Evaluation  of  a Cross-Cultural  Training  System 

Thomas  Santarelli  & Kevin  C.  StagI 
CHI  Systems,  Inc. 

ftsantarelli,  kstaal)  0>chisvstems.com 

Cross-cultural  competency,  and  the  underlying  communication  and  affective  skills  required  to  develop  such  expertise,  is  becoming 
increasingly  Important  for  a wide  variety  of  domains.  To  address  this  need,  we  developed  a blended  learning  platform  which 
combines  virtual  role-play  with  tutorials,  assessment  and  feedback.  A Middle-Eastern  Curriculum  (MEC)  exemplar  for  cross-cultural 
training  U.S.  military  personnel  was  developed  to  guide  the  refinement  of  an  existing  game-based  training  platform.  To  complement 
this  curriculum,  we  developed  scenario  authoring  tools  to  enable  end-users  to  define  training  objectives,  link  performance  measures 
and  feedback/remediation  to  these  objectives,  and  deploy  experiential  scenarios  within  a game-based  virtual  environment  (VE). 
Lessons  learned  from  the  design  and  development  of  this  exemplar  cross-cultural  competency  curriculum,  as  well  as  formative 
evaluation  results,  are  discussed.  Initial  findings  suggest  that  the  underlying  training  technology  promotes  deep  levels  of  semantic 
processing  of  the  key  information  of  relevant  cultural  and  communication  skills. 


1.0  INTRODUCTION 

Modern  political  confrontations  involving  the 
use  of  military  force  are  being  conducted  by 
multinational  coalition  partners  in  multiple, 
concurrent  foreign  theaters.  The  parties  to 
those  arrangements,  and  the  contexts  in 
which  those  operations  are  conducted,  have 
led  military  forces  to  be  increasingly 
concerned  with  cross-cultural  training  as 
they  pertain  to  the  execution  of  warfare, 
counterinsurgency,  peacekeeping  and 
reconstruction  efforts  (see  Field  Manual 
(FM)  3-07  Stability  and  Support  Operations; 
FM  3-24  Counterinsurgency  Operations). 

For  example,  recent  military  operations  in 
Afghanistan,  Bosnia  and  Iraq  have 
highlighted  the  centrality  of  cross-cultural 
acumen  to  securing  and  sustaining  stability 
and  strategic  relationships. 

The  lessons  learned  from  those  global 
deployments  are  invaluable.  Recent 
multinational,  urban-based  operations, 
however,  have  added  a new  layer  of 
complexity;  requiring  soldiers  to  proactively 
anticipate,  interpret  and  influence  the 
thoughts,  behaviors  and  affect  of  their 
coalition  partners,  host  sponsors  and 
heterogeneous  populations  [1]  Moreover, 
transformational  changes  such  as  network- 
centric warfare  have  also  increased  the 
complexity  of  operations;  mandating 
increased  inter-service  and  inter-agency 
coordination  and  thereby  greater  adaptation 
to  fluid  organizational  cultures  [21]. 


Recent  attention  has  focused  on  the 
creation  of  military  training  systems  utilizing 
gaming  technologies.  Many  such  systems 
have  been  developed  to  address  domains 
such  as  Arabic  language  training, 
intelligence  data  collection,  squad-level 
tactics,  cultural  familiarization,  and 
leadership  training  [12,  3, 16].  An 
instructional  trend  in  military  game 
development  has  followed  the  paradigm  of 
experiential  learning  [14]  which  emphasizes 
the  role  of  experience  in  learning. 

This  has  shaped  the  development  of 
training  games  to  be  highly  interactive  and 
free-form,  with  trainees  being  allowed  to 
explore  the  virtual  environment,  to  gain 
experience  through  interactions  with  virtual 
characters  and  the  performance  of  domain 
tasks,  and  to  develop  knowledge  as  the  net 
effect  of  all  interactions  with  the  training 
system.  Learning  is  largely  opportunistic 
and  no  systematic  approach  has  been 
applied  to  insure  adequate  exposure  of 
trainees  to  all  training  objectives  of  a 
structured  curriculum  [20]. 

This  problem  is  further  compounded  by  the 
general  lack  or  poor  implementation  of 
typical  training  mechanisms  (e.g., 
articulation  of  training  goals  and 
performance  standards,  detailed  feedback, 
performance  remediation,  performance 
appraisal,  and  explicit  coaching  prior  to 
learning  opportunities)  within  the  current 
generation  of  military,  game-based  training 
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applications.  These  deficiencies  within 
many  contemporary  VE-based  training 
solutions  are  critical  gaps  as  research 
suggests  that  if  a scenario  is  linked  with 
training  objectives,  then  trainees  are  more 
likely  to  learn  the  underlying  objectives  [5], 
These  deficiencies  were  addressed  in  the 
development  of  the  Middle-East  Curriculum 
(MEC)  discussed  below. 

2.0  BODY 

The  MEC  design  began  with  a training 
analysis  of  a number  of  sources  in  order  to 
derive  a set  of  knowledge/skill/abilities 
(KSAs).  From  those  KSAs,  a training 
objective  taxonomy  was  derived.  An 
underlying  cultural  training  model  was 
developed  and  a set  of  assessments  and 
scenarios  were  created.  This  process  is 
briefly  described  in  the  following  sub- 
sections. 

2.1  Knowledge/Skills/Abilities 

A rich  source  of  KSA  candidates  were 
identified  in  a Defense  Equal  Opportunity 
Management  Institute  (DEOMI)  research 
report,  entitled  Toward  An  Operational 
Definition  Of  Cross-Cultural  Competence 
From  Interview  Data’  [18].  This  data  was 
derived  from  a corpus  of  interview  data  with 
Non-Commissioned  Officers  (NCOs)  with 
recent  Iraq  deployment  experience.  The 
survey  established  a cross-cultural 
competency  (CCC)  definition  and  posits  a 
number  of  factors  impacting  efficacy  with 
respect  to  real-world  CCC.  These  factors 
were  validated  through  qualitative  survey  of 
subject  participants.  Ross’  CCC-based 
KSA  taxonomy  included: 

• Ethnocultural  empathy 

• Experience 

• Flexibility 

• Interpersonal  and 
communication  skills 

• Mental-model/perspective  taking 

• Willingness  to  engage 

• Low  need  for  cognitive  closure 

• Relationship  building 

• Self-efficacy 


• Self-regulation/emotion- 
regulation 

These  factors  were  subsumed  within  the 
MEC  training  design  and  used  to  identify 
important  cross-cultural  KSAs. 

2.2  Training  Objective  Derivation 

The  Middle  East  curricular  content  was 
developed  based  on  analyses  of  widely- 
available  cultural  training  materials.  This 
included  the  Defense  Language  Institute 
(DLI)  cross-cultural  training  materials  (see 
http://www.dliflc.edu/products.html)  as  well 
as  other  available  sources  of  Middle 
Eastern  reference  content.  The  full  set  of 
sources  for  cross-cultural  familiarization 
which  were  drawn  upon  included:  TRADOC 
Culture  Center  Arab  Cultural  Training 
Curriculum,  TRADOC  DCSINT  Handbook 
No.  2:  Arab  Cultural  Awareness,  TRADOC 
DCSINT  Contemporary  Operation 
Environment:  Actors  & Role,  DLI  Countries 
in  Perspective  (Iraq)  and  Iraqi 
Familiarization,  Peacecorps  Culture  Matters 
workbook.  State  Department  Iraq  Fact 
Sheets,  and  the  CIA  World  Fact  Book 
(Iraq/Afghanistan). 

A training  analysis  was  conducted  with 
these  materials  and  a training  objective 
hierarchy  was  created  to  guide  the 
development  of  specific  curricular  content 
such  as  didactic  materials,  assessments, 
and  VE-based  scenarios.  More  importantly, 
this  training  objective  analysis  was  used  to 
develop  a cultural  training  model,  as 
described  below. 

2.3  Cultural  Training  Structure 

There  have  been  a number  of  initiatives 
conducted  to  better  understand  the  nature 
of  effective  cross-cultural  training  for 
multinational  operations.  For  example,  the 
U.S.  Army  Training  and  Doctrine  Command 
(TRADOC)  requested  the  Director  of  Center 
for  Army  Leadership,  Combined  Arms 
Center  and  the  Chief  of  the  Leader 
Development  Research  Unit  at  ARI  to  host 
and  conduct  the  Cultural  Understanding  and 
Language  Proficiency  (CULP)  research 
initiative.  A workshop  supporting  CULP  was 
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conducted  in  July  2007  at  Ft.  Leavenworth, 
Kansas  in  order  to  provide  a clearer 
understanding  of  the  cultural  and  linguistic 
capabilities  required  by  soldiers  [2], 

CULP  participants  were  guided  by  a tri- 
component framework  of  cultural  capability. 
The  tenets  of  this  framework  suggest 
cultural  capability  consists  of:  (1)  cross- 
cultural  competence,  and  to  a lesser  extent: 
(2)  culture- specific  knowledge  and  (3) 
language  proficiency.  CULP  participants 
were  charged  with  defining  cross-cultural 
competence,  as  it  was  deemed  the  most 
important  factor  to  mission  success,  and 
identifying  the  knowledge,  skills  and 
attitudes  required  to  achieve  intercultural 
effectiveness.  In  response  to  this  agenda, 
CULP  participants  posited  that  cross- 
cultural  competence  is  the  key  aspect  of  this 
tri-component  cultural  training  model 
because  “Leaders  will  inevitably  encounter 
situations  in  another  culture  that  do  not 
meet  their  expectations.  Even  if  provided 
with  highly  accurate  region-  or  culture- 
specific  information  in  training,  leaders  will 
not  be  able  to  anticipate  every  impact  of 
cultural  differences”  [2,  p.5].  Moreover, 
multiple,  concurrent  engagements  executed 
in  widely  diverse  areas  of  operation, 
coupled  with  limited  pre-deployment  training 
time,  often  preclude  soldiers  from  fully 
mastering  any  single  national,  regional  or 
local  language  or  dialect.  Therefore,  honing 
linguistic  skills  is  increasingly  less  important 
relative  to  training  time  for  the  development 
of  inter-  or  intra-cultural  competence. 

Based  on  these  assumptions,  coupled  with 
the  analysis  of  the  identified  KSAs  and 
training  objectives  previously  noted,  a 
cultural  training  structure  was  created  along 
three  conceptual  categories:  Culture 
General,  Culture  Applied,  and  Culture 
Specific.  Culture  General  includes  a meta- 
cognitive-based  overview  of  topics  which 
introduce  the  learner  to  broad  definitions  of 
Culture,  American  Culture,  and  the 
importance  of  Religion.  Each  of  these  three 
categories  was  further  sub-divided  into 


specific  content  themes  based  on  the  nature 
of  each,  as  shown  in  Figure  1 


f Culture  General  "N Culture  Applied  ^ Culture  Specific 'N 


□Qieneral  Concepts 
OCutf  ur?  description 
OseiUnd  Culture 
□AmoriciP  Culture 
□Elfinkilv^nd  kfentilv 
□Belter  Sterns 
□Kinship,  Marriage.  Gender 
□Subsistence 
□Politico  Syjiemi 
□Conflict  Resolute 

□CuliurulHitrlUAP 


□Cultureiliocii 

□ Conversation  Differences 

□ social  Ptrspe^ivo  tikinii 

□ Non  Vt>rbal  Behaviors 

□ DealinftMHlfi  Inter  preierb 


□ Religion 

O Goftures/Greeting) 
a Hospitality 
U Convofsation 
y Food 

□ Interacting  with  Women 

□ Interacting  with  Children 

□ Mosque  Etiquette 


V yv yv y 


Figure  1.  MEC  Structure 

This  design  supports  the  MEC  instructional 
design  as  each  theme  can  have  a variable 
number  of  elements  {i.e.,  assessments, 
tutorials,  and  scenarios),  incrementally 
moves  from  most-general  (CG)  through  to 
most- specific  (CS),  and  is  granular  enough 
to  enable  users  to  digest  each  given  theme 
as  an  individual  lesson  within  small  periods 
of  training  time. 

2.4  Cultural  Assessments 

Assessments  were  created  based  on  the 
culture-assimilator  concept  [10, 15, 17], 

The  Culture  Assimilators  developed  for  the 
MEC  were  modeled  after  a set  of  Arab 
culture  assimilators  previously  developed 
through  the  Defense  Language  Institute 
[1 1].  The  DU  Arab  culture  assimilator 
content  is  divided  into  five  specific  books 
and  organized  around  clustered  themes. 
This  includes  hospitality  and  conversion, 
religious  practices  and  history,  behaviors 
toward  food  and  women,  greetings  and 
nuances  of  thought,  and  traditions  versus 
progress.  The  five-book  culture  assimilator 
was  a rich  source  of  material,  as  it  contains 
more  than  sixty  hypothetical  cross-cultural 
narratives  within  four-hundred  and  fifty 
pages.  This  corpus  was  analyzed  to  derive 
topical  sub-dimensions  across  the  five 
books. 

2.5  Scenario  Definition 

Scenarios  were  developed  along  a number 
of  themes  in  support  of  the  training 
objectives  identified  above.  These  scenario 
themes  are  briefly  describe  below. 
Negotiation  of  cooperation  between  Arab 
tribal  leader  and  U.S.  Forces  regarding 
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humanitarian  support  for  infrastructure 
rebuilding:  This  type  of  multi-party 
negotiation  is  quite  common  and  stresses 
cross-cultural  differences  that  can  often 
impede  or  negate  positive  interactions  and 
outcomes.  In  Arab  cultures,  decision 
making  is  typically  not  a diffuse  “meeting  of 
the  minds”;  on  the  contrary,  decision- 
making typically  is  vested  in  the  hands  of  a 
tribal  leader.  The  determinant  of  negotiation 
outcomes  is  largely  a function  of  the  cultural 
competency  of  the  U.S.  negotiation  party. 

Negotiation  of  cooperation  between  Arab 
tribal  leader  and  U.S.  Forces  diffusing 
rioting  in  the  city  after  U.S.  forces 
mistakenly  fired  on  a wedding  party  after 
celebratory  gun-fire:  An  important  cultural 
practice  within  many  Arab  cultures  is  that  of 
celebration  of  key  life  events  (e  g.,  funeral, 
wedding  etc.)  is  that  of  weapon-firing.  This 
can  create  tensions  and  misinterpretation  of 
the  intent  of  such  celebrations  as  they  can 
be  perceived  as  threatening  rather  than 
non-threatening  events. 

U.S.  military  personnel  training  indigenous 
security  and  military  personnel:  U.S.  forces 
are  commonly  tasked  with  training 
indigenous  populations’  security  and  military 
forces  to  increase  their  tactical  and  military 
capabilities  for  self-protection.  This 
scenario  entails  a U.S.  Lt  to  interact  with 
and  train  a small  ten-person  ISF  squad. 
These  contexts  typically  place  an  emphasis 
on  cross-cultural  interactions  involving  a 
small  minority  of  U.S.  military  personnel  in 
contact  with  a larger  majority  of  indigenous 
personnel.  Clashes  between  cultures  are 
common  including  cultural  differences 
related  to  authoritarianism,  forced- 
compliance,  tolerance,  and  protection  of 
honor  in  a group  setting. 

Entry  into  a Mosque  through  demonstration 
of  proper  Mosque  etiquette:  Given  that  the 
Mosque  is  a ubiquities  part  of  Arab  religious 
and  cultural  life,  U.S.  military  personnel 
have  a large  degree  of  contact  with,  and 
need  to  enter.  Mosques.  U.S.  military 
personnel  are  in  the  tenuous  position  of 


having  to  balance  the  somewhat  conflicting 
goal  of  performing  MOUT-based  patrols, 
sweeps,  and  apprehension  of  persons  of 
interest,  with  the  need  to  show  great 
deference  for  required  etiquette  and 
practices  associated  with  entry  to,  and 
prescribed  and  proscribed  behaviors  within, 
a Mosque.  This  scenario  involves  a U.S.  Lt 
needing  to  gain  entry  into  a Mosque, 
demonstrate  appropriate  cultural 
understanding  to  remain  there,  and  elicit 
intelligence  information  from  the  Mosque 
religious  leadership  regarding  a person-of- 
interest. 

Gathering  of  intelligence  indicators  through 
interaction  with  local  populace:  Garnering 
the  support  of  the  local  populace  is  key  in 
many  military  operations  but  particularly  so 
for  intelligence  operations  in  support  of 
Commanders  Critical  Information 
Recruitments  (CCIR).  Respecting 
conventional  cultural  norms  is  a key 
determinant  of  success  in  this  context.  This 
scenario  involves  a U.S.  Lt  interacting  with 
members  of  the  local  populace  in  order  to 
elicit  intelligence  indicators  regarding  the 
presence  of  unknown  individuals  within  the 
city,  observed  loitering  (possible 
surveillance)  near  key  infrastructure,  reports 
of  threats  or  intimidations  toward  local 
business,  and  perceptions  of  corruption  of 
local  law-enforcement. 

3.0  DISCUSSION 

We  approached  the  task  of  instantiating  the 
MEC  content  described  above  with  a 
hypothesis  that  skills  involving  person-to- 
person  communication,  cross-cultural 
dynamics,  an  understanding  of  non-verbal 
behavior  and  other  broad  types  of 
interpersonal  skills,  require  extensive 
experiential  practice  before  they  can  be 
reliably  and  independently  applied  by  the 
learner  in  a broad  range  of  everyday 
situations.  However,  providing  only 
experiential  training  via  a game-based 
virtual  environment  encounter  is  likely  to  be 
just  as  limiting  and  ineffective  as  providing 
only  lecture-based  presentation  of 
abstracted  information  on  cultural  sensitivity 
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or  on  a specific  (sub-)culture.  This  is 
because  the  game-based  experiences 
address  only  one  aspect  of  the  skill-  and 
knowledge-development  process  - that  of 
experiential  application  or  practice.  While 
essential  to  learning,  practice  must  be 
combined  with  three  other  broad  functions 
to  achieve  effective  training.  Specifically, 
practice  must  be: 

• Supplemented  with  didactic  instruction 
such  as  demonstration,  lecture  or 
presentation; 

• Guided  with  scaffolding  to  provide 
coaching  and/or  feedback  that  directly 
or  indirectly  promotes  deliberative 
learning  and  introspection;  and 

• Managed  through  formative  (pre)  and 
summative  (post)  assessments  that 
guide  the  learner’s  progress  toward  the 
learning  objectives. 

The  last  function  in  the  above  list  points  to 
the  explicit  purposiveness  of  training,  based 
on  explicit  learning  objectives.  In  the  MEC, 
the  learning  objectives  drive  not  only  the 
assessment  process,  but  also  the 
sequencing  and  management  of  the  didactic 
instruction  and  practice  process  as  well. 

The  learner  is  systematically  paced  through 
a cyclic  curriculum  of  instruction,  practice, 
and  assessment  in  a way  that  takes  the 
trainee  systematically  toward  the  goal  of 
achieving  and  demonstrating  competence  in 
the  specific  objectives  of  the  training.  Thus, 
the  learning  objectives  strongly  constrain 
the  design  of  the  practice  environment  {i.e., 
game — ^the  two  terms  are  used 
interchangeably  here)  to  ensure  that  it 
provides  clear  opportunities  for  practice  and 
assessment  of  the  various  actions  and 
knowledge  that  the  trainee  must  acquire. 
Whatever  features  are  designed  into  the 
training  game  to  make  it  interesting, 
challenging  and  engaging,  they  must  be 
subordinated  to  these  needs  to  provide 
opportunities  to  perform  and  be  assessed  in 
the  performance  of  the  target  knowledge 
and  actions.  The  impact  of  the  objectives 
on  the  MEC  practice  environment  flows  into 
three  separate  aspects  - the  tutorials  and 


assessments  outside  the  game,  the 
scenarios  and  the  dynamics  in  the  game 
environment  itself,  and  the  behavior  of  the 
NPCs  with  whom  the  trainee  interacts  in  the 
game.  The  curriculum  is  objectives-based 
and  includes  well  integrated  didactic 
instruction,  game-based  practice,  coaching 
and  feedback,  and  an  ongoing  assessment 
process  that  drive  the  cycle  of  learning 
forward  on  an  individualized  basis. 

3.1  Instantiating  Content 

This  Middle  East  curriculum  was 
Implemented  using  the  game-based  cultural 
training  architecture  VECTOR®  [9] 
previously  created  by  the  research  team 
because  it  provided  a great  degree  of 
flexibility  while  requiring  only  incremental 
modifications.  This  includes  a didactic 
learning  component  implemented  using  the 
commercial  product  Toolbook,  a game- 
based  practice  component,  a suite  of 
authoring  tools  for  creating  and  extending 
scenarios,  and  a data-base  back-end  for 
storing  trainee  performance  results. 

In  the  practice  game,  the  trainee  can 
progress  through  a series  of  scenarios, 
each  of  which  involves  interacting  with  a 
specific  physical  avatar  or  Non-Player 
Character  (NPC)  that  possesses  a specific 
set  of  cultural-behaviors  and  sensitivities. 
The  interaction  between  the  trainee  and  the 
NPCs  in  the  scenario  is  organized  into 
transactions,  in  which  each  party  each  say 
one  thing  in  a turn-taking  fashion.  For  the 
trainee,  each  turn  is  represented  by  a pre- 
defined set  of  utterances  from  which  the 
trainee  must  select  one.  The  progress 
through  the  scenario  depends  completely 
on  the  trainee’s  choices  and  the  NPCs  react 
differently  on  each  path  based  on  their  pre- 
defined cultural  sensitivities.  Figure  2 
shows  an  example  of  the  dialog  choices 
available  to  the  trainee  during  an  initial- 
meeting encounter  with  a specific  NPC 
named  ’‘Nabil”. 
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Figure  2.  Interacting  with  an  NPC 


This  fidelity  required  of  the  NPC  avatars 
required  a range  of  affect  and  expressivity 
using  expressive  non-verbal  behaviors.  The 
requirements  for  cross-cultural  training 
dictated  that  the  scenarios  needed  to 
include  voice-acted  speech  coupled  with 
avatars  capable  of  a range  of  para-linguistic 
expressivity.  Because  these  features  were 
not  required  in  the  original  creation  of  the 
VECTOR®  system,  this  presented  a 
technology  gap.  To  address  this  gap  we 
integrated  a high-fidelity  character- 
animation  and  lip-syncing  tool,  FaceFX  [13], 
in  order  to  provide  highly  interactive  avatars 
capable  of  conveying  subtle  non-verbal 
cues.  The  use  of  FaceFX  provides  a 
smooth  pipeline  for  processing  voice-acted 
wav  files  against  avatar  speech  (i.e.,  dialog) 
and  produces  character  asset  files  which 
are  then  used  to  drive  high-realistic  game 
avatars. 

During  interactions  with  game  NPCs,  the 
trainee  is  expected  to  maintain  trust  with  the 
avatars  by  communicating  in  ways  that 
show  deference  for  the  modeled  cultural 
norms  and  communication  expectations. 

The  NPC  speaks  via  a voice-actor  while  the 
trainee  selects  responses  via  text  presented 
on  the  screen.  One  of  the  forms  of 
scaffolding  and  feedback  provided 
dynamically  to  the  trainee  is  a "trust  meter" 
based  on  trainee  responses  {in  the  top  left) 
and  is  an  aggregate  measure  of  NPC  trust. 
Additional  measures  of  performance  are 
calculated  and  stored  in  the  trainee 


database  for  off-line  use  by  an  instructor  or 
training  administrator.  Note  that  in  the  trust 
meter,  Nabil’s  trust  of  the  trainee  is  average. 
As  the  trainee  interacts  with  any  given  NPC, 
the  trust-meter  will  increment  or  decrement 
as  a result  of  choices  the  trainee  makes 
while  interacting  as  a function  of  success 
relative  to  underlying  training  objectives. 
Additionally,  an  after-action  review  (AAR) 
summarizes  all  learning  objectives 
measured  against  trainee  performance,  per 
scenario,  as  seen  in  Figure  3. 


Figure  3.  After  Action  Review  Exampie 

3.2  Scenario  Authoring 

Despite  successes  in  applying  simulation 
and  serious-games  to  interpersonal  skills 
training,  scenario  content  generation 
remains  an  obstacle  to  the  cost-effective 
use  of  the  technology.  In  fact,  a common 
criticism  of  game-based  training  has  been 
the  lack  of  a systematic  approach  to  linking 
learning  objectives  to  scenario  content. 

To  this  end,  an  important  capability  was  the 
inclusion  of  an  authoring  facility.  Such  a 
facility  would  provide  three  advantages  in 
that  it: 

♦ Allows  for  systematic  and  repeatable 
manipulation  of  existing  scenario  to 
support  experimentation; 

* Provides  the  ability  for  third-party  end- 
users  to  add  content  based  on 
changes  in  cultural  conditions 

• Positions  scenario  creation  in  the 
context  of  training-objective 
articulation,  performance 
measurement,  feedback  and 
assessment. 
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The  existing  VECTOR®  scenario  editor, 
depicted  in  Figure  4,  enabied  the  efficient 
creation  of  the  Middie  East  game-based 
scenarios.  The  integration  of  instructional 
design  principles  into  the  authoring  process 
promoted  effective  training  scenario 
instantiation  [4], 


Figure  4.  Scenario  Editor 

To  facilitate  consistent  scenario  creation,  a 
workflow  model  for  scenario  authoring  was 
included  within  the  existing  scenario 
authoring  tool.  The  overall  approach  to 
make  scenario  authoring  more  accessible  to 
a wider  audience  {i.e.,  beyond  “game” 
engineers),  was  to  use  a cinematic 
metaphor  to  create  the  design  of  the 
authoring  tool  interface.  The  use  of 
cinematic  metaphors  has  been  successfully 
used  in  similar  virtual  environments  [19,  7]]. 
Scenario  authoring  within  VECTOR® 
encompasses  a number  of  training  aspects, 
including: 

• Training  objective  specification: 
Includes  a library  of  training 
objectives  which  can  also  be 
expanded  using  the  objective  editor. 

• Scenario  information:  This  includes 
specifying  high-level  scenario 
information  such  as  authorship 
tracking  (critical  when  scenarios  are 
created  and  maintained  by  multiple 
authors),  target  trainee  population 
details,  and  other  aspects  of  the 
overall  scenario  learning  goals. 


• Environment  specification:  Includes 
the  designation  of  specific 
environment/setting  within  which  a 
scenario  will  take  place  to  support 
the  identified  training  requirements. 

• Plot  organization:  Involves  the 
creation  and  arrangement  of  an 
overall  scenario  “story”  which 
supports  the  selected  training 
objectives  and  conveys  a complete, 
coherent  scenario  to  the  trainee. 

• Vignette  creation:  Encompasses  the 
process  of  creating  detailed  dialog- 
based  interactions  and  trainee 
response  options,  linking  those 
interactions  to  training  objectives, 
specifying  feedback  and  coaching, 
and  other  measurement  details. 

• Scenario  generation:  Process  for 
reviewing  and  validating  the 
scenario  before  export  to  the  game- 
engine  “player”. 

3.3  Experimentation 

As  a continuation  of  on-going  collaborations 
with  the  USMA  at  West  point,  a student- 
executed  study  using  an  earlier  version  of 
the  VECTOR®  system  and  Middle  East 
content  was  conducted  by  Cadets  at  the 
USMA  in  2008  [6],  The  goal  was  to  address 
the  issue  of  how  an  interactive  game-based 
training  experience  might  influence  the 
retention  of  training  content  in  accordance 
with  the  “depth  of  processing”  theory  of 
Craik  & Lockhart  [8].  The  findings  of  this 
field  experiment  regarding  trainee  reactions, 
knowledge  acquisition  and  knowledge 
retention  are  presented  in  the  three 
respective  subsections  below.  All  results 
reported  are  statistically  significant  at  p < 

.05,  one-tailed. 

A multidimensional  training  effectiveness 
framework  was  leveraged  to  guide  the 
planning  and  conduct  of  this  field 
experiment  to  evidence  the  effects  of 
VECTOR®  relative  to  another  cultural  and 
negotiations  training  system,  known  as 
ELECT  BiLAT,  on  trainee  reactions, 
learning  and  knowledge  retention,  A 
between- subjects  design  was  used  to 
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document  the  differential  effects  of  the  two 
training  simulator  systems  on  trainee 
knowledge  of  cultural  negotiation 
techniques. 

Trainee  procedural  knowledge  of  cultural 
negotiations  was  measured  after  trainees 
completed  a given  simulator-delivered 
scenario- based  training  (SBT)  event  set 
(viz..  VECTOR®  v.  ELECT  BiLAT).  Trainee 
reactions,  operationalized  herein  in  terms  of 
affective,  utility,  engagement  and  difficulty 
reactions,  were  also  measured  post- 
training. Cadet  retention  of  procedural 
cultural  knowledge  was  also  measured 
post-training. 

3.3.1  Setting  and  Procedure 

Trainees  (N  = 28)  participating  in  the  study 
were  Second  Class  Cadets  enrolled  in  the 
PL300  Military  Leadership  course  at  the 
United  States  Military  Academy  at  West 
Point.  The  cultural  training  simulations,  and 
trainee  assessment  tests  and  measures, 
were  administered  in  a classroom  setting  as 
part  of  a project  led  by  two  Cadets  attending 
West  Point.  Trainee  participants  ranged  in 
age  from  20-years  to  22-years. 

Cadets  attending  the  leadership  course 
were  selected  on  a convenience  basis  by 
the  researchers  and  randomly  assigned  to 
complete  a simulator-delivered  SBT  event 
set.  More  specifically,  trainees  read  and 
signed  an  informed  consent  form  and  were 
randomly  assigned  to  leverage  either  the 
VECTOR®  or  Elect  BiLAT  cultural  training 
simulation  system  to  complete  cultural 
negotiations  training.  Upon  completion  of 
the  simulator-delivered  SBT  event  set, 
trainees  completed  a post-training  test  of 
cultural  knowledge  and  training  reaction 
measures.  After  finishing  training,  trainees 
also  completed  a measure  of  cultural 
knowledge  to  index  knowledge  retention. 

3.3.2  Trainee  Reactions 

In  terms  of  trainee  reactions  to  the 
simulator-delivered  cultural  training 
experience,  trainees  completing  the 
VECTOR®  SBT  event  set  reported  being 


more  engaged  during  training  than  those 
trainees  exposed  to  ELECT  BiLAT-based 
training.  More  specifically,  relative  to  those 
Cadets  exposed  to  ELECT  BiLAT,  those 
trainees  exposed  to  VECTOR®  reported 
more  favorable  engagement  reactions  to  the 
simulator-delivered  cultural  training  (f(23)  = 
-2.02,  p < .05).  No  statistically  significant 
differences  were  observed  between  the  two 
simulator  system  conditions  in  terms  of 
trainee  affective,  utility  and  difficulty 
reactions  to  training. 

3.3.3  Trainee  Knowledge  Acquisition 

Trainees  assigned  to  complete  the 
VECTOR®  SBT  event  set  also  acquired 
more  procedural  knowledge  of  cultural 
negotiation  techniques  than  those  trainees 
exposed  to  the  ELECT  BiLAT  training 
system.  More  specifically,  the  results  of  an 
independent-samples  t-test  analysis 
suggest  that  relative  to  Cadets  exposed  to 
ELECT  BiLAT,  those  trainees  exposed  to 
VECTOR®  had  greater  knowledge  of 
cultural  negotiation  techniques  at  the 
completion  of  the  training  session  (f(24)  = - 
1.90,  p<.  05). 

3.3.4  Trainee  Knowledge  Retention 

VECTOR®  trainees  also  retained  more 
knowledge  of  how  to  effectively  negotiate  in 
a multicultural  context.  The  results  of  an 
independent-samples  t-test  analysis 
suggest  trainees  participating  in  VECTOR® 
delivered  training  maintained  more 
procedural  knowledge  of  cultural  issues 
than  those  trainees  subject  to  the  ELECT 
BiLAT  simulator  system  (f(24)  = -2.17,  p < 
.05).  Moreover,  a more  fine-grained 
analysis  of  those  items  comprising  the 
retention  test  which  were  designed  to  test 
trainee’s  culture-specific  knowledge  also 
evidenced  a stronger  impact  of  the 
VECTOR®  system  on  Cadet  learning. 
Relative  to  Cadets  exposed  to  ELECT 
BiLAT,  those  Cadets  exposed  to  VECTOR 
retained  more  culture-specific  procedural 
knowledge  (f(24)  = -2.64,  p < .01). 


626 


4.0  CONCLUSIONS 

The  shift  in  DoD  focus  from  high  intensity 
conflicts  to  the  preparation  for  Stability, 
Security,  Transition,  and  Reconstruction 
(SSTR)  operations  has  led  to  the  desire  to 
increase  the  availability  of  cultural 
familiarization  training  for  U.S.  forces.  One 
approach  that  has  generated  interest  is  the 
use  of  game-based  solutions.  In  theory  the 
use  of  game-based  VEs  should  permit 
game-based  cultural-training  to  be  practiced 
by  a greater  audience. 

The  work  described  here  investigates  the 
ability  of  a game-based  solution  to 
effectively  utilize  a VE  for  an  effective 
cultural  training  experience.  We  have 
concluded  that  four  key  elements  are 
responsible  for  an  effective  application  of 
this  technology  towards  this  goal.  These 
four  elements  include: 

1 . Scalability  allows  for  an  application 
created  for  an  initial  small  group 
training  framework  to  be  enlarged  to 
a much  greater  ‘N’  of  trainees 
without  concomitant  increase  in  cost. 

2.  Extensibility  provides  for  addition  of 
new  types  of  virtual  cases  within 
previously-designed  VE's,  allowing 
new  and  different  forms  of  norms 
and  culture  to  be  imparted  and 
assessed. 

3.  Evaluability  allows  for  the  direct 
application  of  comparative- 
effectiveness metrics  to  a system. 
Such  a feature  prevents  the 
simplistic  aspect  of  some  training 
programs'  “show-it-and-trust-it” 
approach  to  any  domain  knowledge. 

4.  Authorability  gives  tools  to  non- 
technical domain  experts,  such  as 
clinicians,  permitting  them  to 
populate  cases  without  having  to 
supply  code  or  otherwise  contend 
with  excessively  technically- 
constrained  requirements: 

The  VECTOR®  system  has  the  potential  to 
reduce  the  cost  of  developing  interpersonal 
skills-based  training  applications,  allow  such 
applications  to  be  easily  disseminated  to  a 


large  numbers  of  trainees,  and  permit  the 
applications  to  be  executed  on  a wide 
variety  of  computer  hardware.  We  expect 
that  the  Army  and  Marine  Corps  will  obtain 
the  greatest  benefit  from  similar  applications 
of  cultural  training  because  personnel  in 
those  services  have  the  greatest  need  for 
direct  interaction  with  the  local  populace  in 
deployment  areas.  The  VECTOR®  platform 
will  enable  the  development  of  training 
applications  in  other  interpersonal  skills 
domains  and  therefore  will  be  a valuable 
training  application  for  Government  and 
private  sector  use. 
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The  purpose  of  this  paper  is  to  look  for  links  in  a virtual  trainee’s  interest  and  seif-efficacy  in  a simulated 
event  as  it  relates  to  their  previous  self-reported  technical  skill  level.  Ultimately,  the  idea  would  be  to 
provide  the  right  amount  of  support  at  the  right  place  at  the  right  time  to  set  the  conditions  for  maximum 
transfer  of  the  skill  sets  to  the  work  place.  An  anecdotal  recap  of  a recent  experiment  of  a medium-scale 
training  event  produced  in  a virtual  world  will  provide  examples  for  discussion.  In  July  2010,  a virtual 
training  event  was  produced  for  the  Air  Force  Research  Lab’s  Games  for  Team  Training  (GaMeTT)  at  the 
Patriot  Exercise  at  Volk  Field  in  Wisconsin.  There  were  29  EMEDS  participants  who  completed  the 
simulated  OCO  event  using  the  OLIVE  gaming  engine.  Approximately  25  avatars  were  present  at  any 
given  time;  including  role  players,  observers,  coordinators  and  participants. 


1.0  INTRODUCTION 

There  is  a growing  number  of  high  fidelity, 
well-developed,  multi-player  gaming 
engines  available  to  organizations  for  the 
purpose  of  training  individuals  and 
geographically  dispersed  teams.  These 
gaming  engines  provide  a variety  of  options 
and  capabilities,  from  which  detaiied 
simuiated  events  can  be  conceptuaiized 
and  executed  virtually.  This  trend  toward 
virtual  world  training  opens  up  a plethora  of 
options  for  organizations  to  accompiish 
iearning  objectives  and  design  experiences 
that  aiiow  team  functioning  and  practice  in  a 
safe  environment. 

Occasionaiiy  these  objectives  can  become 
overwheiming  to  participants  with  iittie  or  no 
previous  gaming  experience.  At  best, 
individual,  experienced  gamers  can  interact 
in  the  virtual  world  with  little  interruption,  but 
participant  teams  are  seidom 
homogeneously  technophiles.  The  task  of 
supporting  the  accomplishment  of  training 
objectives  executed  in  a virtual  environment 
with  participants  of  wildly  varying  technical 
skill  sets  can  become  a barrier  to  the 
achievement  of  the  objectives. 


With  this  idea  in  mind,  participants  of  an 
experiment  with  a simuiated  event  in  a 
virtual  world  were  recently  asked  questions 
related  to  their  technical  skill  level,  interest 
level  in  virtuai  training  and  self-efficacy. 

The  purpose  of  this  paper  is  to  report  upon 
the  responses  and  iook  for  iinks  in  a virtual 
trainee’s  interest  and  seif-efficacy  in  a 
simulated  event  as  it  relates  to  their 
previous  self-reported  technical  skill  level. 
Ultimately,  the  idea  would  be  to  provide  the 
right  amount  of  support  at  the  right  place  at 
the  right  time  to  set  the  conditions  for 
maximum  transfer  of  the  skill  sets  to  the 
work  place.  An  anecdotal  recap  of  a recent 
experiment  of  a medium-scale  training 
event  produced  in  a virtuai  world  will  provide 
examples  for  discussion.  . 

2.0  BODY 

The  experiment  took  piace  at  the  close  of 
the  above  mentioned  Patriot  Exercise  on 
July  20th  and  21st  at  building  533  at  Volk 
Field  near  Tomah  Washington.  This 
particular  building  is  on  the  flight  line  of  the 
field  and  consists  of  several  large  rooms 
connected  by  halls  with  smaller  office 
spaces.  The  rooms  were  equipped  with 
ample  seating,  though  minimal  desk  space 
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for  computer  set  up.  The  building  contained 
a SIPRINET  network  which  was  unavailable 
for  this  unclassified  experiment.  However, 
the  sponsors  of  the  event  were  able  to 
secure  2 DSL  connections  which  were 
temporarily  routed  to  the  building  and 
became  the  uplink  for  the  Ethernet  network 
engineers  stood  up  for  the  event. 

Since  wireless  connections  were  not 
allowed,  a router  was  connected  to  the  DSL 
lines  and  Ethernet  connections  were 
established  for  each  of  13  participant 
laptops  used  for  participant  and  exercise 
support  personnel.  Two  basic 

configurations  were  used  during  the 
experiment.  The  first  configuration  was  a 
simple  semi  circle,  with  a trainer  station  in 
the  center  and  a screen  behind  the  trainer 
stand. 


The  second  configuration  was  needed  in 
order  to  simulate  a geographical  distance 
between  players,  as  would  be  the  case  in 
ultimate  use  of  the  GaMeTT  Training 
System.  So,  after  some  initial  training  was 
completed,  participants  were  scattered 
throughout  the  facility  in  various  rooms  and 
offices.  To  make  this  happen,  the  network 
configuration  above  was  broken  down  and 
moved  to  create  various  access  points  The 
new  seating/ network  arrangement  was 
designed  to  produce  as  much  space  as 
possible  between  participants,  mimicking 
the  way  in  which  users  would  connect  from 


work  or  home.  See  the  diagram  below  for 
this  configuration; 


2.1  Details  of  the  Event 

There  were  three  groups  (Wednesday  PM, 
Thursday  AM  and  Thursday  PM)  brought  in 
by  van  at  the  close  of  the  second  week  of  a 
two  week  live  training  event  for  forward 
medical  teams.  It  is  noteworthy  that  each  of 
the  29  participants  arrived  in  the  building 
after  almost  two  weeks  of  live  exercises 
which  continued  24  hours  a day  for  10  days. 
Each  of  the  three  groups  was  composed  of 
individuals  ready  to  play  the  following  roles 
in  the  simulated  event: 

Nurse  (2) 

Doctor  (1) 

Administrative  Officer  (1) 

Administrative  Technician  (1) 

Medical  Technician  (3) 

Each  of  these  medical  personnel  was  given 
a 30  page  user  guide  with  quick  reference 
charts  and  a live  one  hour  intensive 
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instructional  session  on  the  basics  of 
operating  their  avatar  and  functionality  of 
the  virtual  world.  This  training  was 
conducted  in  the  semi-circle  shown  above. 
Additionally,  one  support  person  for  every  3 
participants  was  available  to  answer 
questions. 

Operation  of  the  simulation  required  the 
movement  of  an  avatar,  movement  of 
objects  in-world  and  the  use  of  a series  of 
menus  to  access  the  medical  treatment 
model  for  trauma  cases  that  were  a part  of 
the  simulation. 


multiple  variables  of  creating  a successful 
simulated  event  in  a virtual  world 
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After  the  initial  training  event,  the 

participants  were  separated  into  the  second 
configuration  in  order  to  simulate  a 

distributed  environment.  In  this  setup  - a 
minimum  1 live  support  person  was 

provided  for  every  4 participants. 

During  the  execution  of  the  event,  it  must  be 
noted  that  the  server  housing  the  virtual 
environment  went  down  twice  and  the  DSL 
Internet  connection  feeding  the 
experimental  network  service  was 

interrupted  a minimum  of  three  times. 


2.2  Evaluation  Criteria 
The  overall  evaluation  methodology  was 
based  on  Gagne’s  Nine  Instructional 
Events,  Gagne  proposed  that  if  learning 
content  contained  9 significant  elements, 
the  optimum  conditions  for  learning  would 
be  created  for  the  transfer  of  learning  from 
the  training  environment  to  the  real  world. 
The  graphic  below  represents  these  9 
events  (Gagne,  1985). 

This  approach  to  the  assessment  and 
ultimately  evaluation  of  the  assessment 
results  takes  into  account  the  fact  that  the 
transfer  of  training  knowledge,  skills  and 
abilities  can  be  difficult  to  achieve  and 
measure  in  any  setting.  Furthermore,  it 
acknowledges  that  the  use  an  inquiry-based 
approach  provides  a holistic  and  iterative 
developmental  process  for  solving  the 


Gagne’s  hierarchical  model  builds  from  the 
lowest  level  (Gaining  attention)  and  works 
its  way  up  to  “Enhanced  retention  and 
transfer"  so  that  simulation  architects  can 
ensure  that  the  optimum  conditions  for 
learning  have  been  created.  For  example,  it 
is  important  to  build  in  environmental 
objects  and  training  cues  that  get  the 
learner’s  attention  before  attempting  to 
share  the  learning  objectives.  Simulation 
architects,  engineers,  and  instruction 
designers  can  use  the  methodology’s 
guidelines  and  checklists  to  assist  with 
prioritization  and  to  mark  the  distinction 
between  desired  from  required  elements. 
Tools  such  as  checklists  and  guidelines  can 
help  streamline  concurrent  development 
occurring  in  three  or  four  related  but  distinct 
design  fields:  engineering,  instructional, 
graphical  and  logistics.  (Gagne.  1977) 

The  idea  is  to  use  a short  cycle  of  feedback 
and  assessment  to  influence  the 
development/execution  of  the  simulated 
event  as  opposed  to  an  independent 
examination  that  is  conducted  only  once  at 
the  close  of  the  project.  Virtual  world 
development  is  too  intricate  and  multi- 
variable  to  be  reduced  to  a single  snap  shot 
in  time  that  produces  and  yes  or  no  answer. 

Meeting  the  requirements  of  the  Gagne’s 
Nine  Instructional  Events  can  be  more 
complex  than  it  may  seem  because  even 
though  their  effect  is  hierarchical,  the 
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creation  may  be  neither  contiguous  or 
chronological. 

The  survey  responses  included  in  this  paper 
are  part  of  this  nine  layer  approach  to 
evaluating  simulated  events  implemented  in 
virtual  worlds.  In  the  Pre-Event  Survey,  the 
participants  were  asked  to  self-report  their 
technical  skill  level  directly  as  well  as  in  a 
series  of  questions  designed  to  give  a 
relative  sense  of  their  technical  abilities  in 
relationship  to  others.  Participants  were 
also  asked  to  rate  their  interest  level  in  the 
concept  of  conducting  training  in  a virtual 
context. 

In  the  Post-Event  Survey,  participants  were 
asked  if  they  felt  they  would  apply  their 
newly  acquired  skill  sets  on  the  job. 
Research  shows  that  this  concept  of  self- 
efficacy  is  a strong  indicator  to  the  transfer 
of  skill  sets  from  a training  environment  to 
the  workplace. 

Learner  characteristics  that  support  transfer 
are  self-efficacy,  pre-training  motivation  and 
perceived  utility.  {Bandura,  1994)  In  fact,  a 
number  of  recent  studies  indicate  that  self- 
efficacy  is  the  primary  indicator  of  whether 
or  not  participants  will  experience  increased 
performance  once  returning  to  the  work 
place.  Further,  the  complex  task  of 
measuring  performance  improvement  on 
the  Job  and  attributing  that  performance  to 
causal  factors  related  to  training  events  can 
be  short  circuited  by  simply  asking  the 
participant  how  useful  they  found  the 
learning.  (Grille,  2000)  Similarly,  asking 
learners  about  their  reasons  for  participating 
in  a learning  experience  (are  they  motivated 
to  learn)  and  whether  or  not  the  material  is 
applicable  to  their  work  (will  they  use  the 
skill  on  the  job)  is  a positive  indicator  that 
transfer  will  occur. 

3.0  DISCUSSION 

Among  all  of  the  before  mentioned  details, 
there  are  several  caveats.  The  first  is  that 
if  the  experiment  were  extended  to  include 
participants  who  are  dispersed 
geographically,  all  training  and  support 


would  be  done  using  the  web  and/or  VoIP 
connection.  Training  could  possibly  be 
conducted  via  webinar,  which  still  means 
participants  would  need  to  ready  their 
equipment  with  the  appropriate  downloads 
and  hardware  to  run  the  virtual  environment. 

This  particular  experiment  was  conducted 
live  because  the  participants  gathered  for  a 
live  training  exercise  and  were  a “captive” 
audience  on  whom  the  tool  could  be  tested. 

Results  of  the  participant  Pre-Survey  show 
that  64%  reported  themselves  as  technically 
proficient  and  69%  appear  to  be  tech-savvy 
to  their  peers.  This  similar  number  indicates 
some  agreement  among  the  direct  and  less 
direct  questions  regarding  technical  skill 
level.  Also,  there  were  5 individuals  who 
rated  themselves  the  lowest  possible 
number  on  the  technically  proficient  scale 
and  4 individuals  who  rating  themselves  the 
highest  possible  number. 

When  asked  whether  or  not  they  were 
interested  in  learning  more  about  virtual 
worlds,  13  individuals  or  46%  said  they 
highly  agreed,  agreed,  or  were  neutral. 
Likewise.  57%  of  the  participants  believed 
that  virtual  training  can  be  effective  for  their 
team.  Forty-six  percent  said  they  would  be 
willing  to  train  virtually  when  they  return 
from  the  live  exercise  and  64%  indicated 
they  have  high  expectations  for  simulated 
events  in  virtual  worlds. 

On  the  Post-Event  Survey,  a consistent 
64%  said  they  would  apply  the  skills  learned 
in  the  simulated  event  to  their  work. 

4.0  CONCLUSION(S) 

These  results  are  incredibly  consistent  and 
remarkably  unremarkable  considering  the 
diversity  of  the  group.  The  numbers  may  be 
interpreted  to  say  that  about  the  same 
number  of  highly  tech-savvy  individuals 
have  high  expectations  and  plan  to  apply 
their  skills  learned  in  the  virtual  world  to  the 
real  world.  These  results  span  the  Pre-  and 
Post-Event  surveys,  therefore  show  little  or 
no  change  in  attitude. 
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There  are  two  additional  questions  from  the 
Post  Event  Survey  that  may  reveal  some 
interesting  attitudes  surrounding  this 
simulated  event  in  a virtual  world. 

When  asked  “During  the  training  session,  I 
was  provided  with  enough  support  to  be 
able  to  adequately  use  the  technology”,  only 
29%  of  the  participants  highly  agreed  or 
agreed  with  the  statement.  Further,  when 
asked  to  agree  with  the  statement 
“Someone  was  available  to  answer  my 
questions  about  the  virtual  world  used  in 
this  training  session."  a mere  5 individuals 
or  17%  highly  agreed  or  agreed. 

These  low  numbers  beg  the  question  - if 
the  simulated  event  was  executed,  and 
participants  were  able  to  learn  skill  sets  that 
would  be  applied  to  the  work  place,  how  is  it 
that  the  users  did  not  feel  supported? 
Another  interesting  question  would  be,  what 
types  of  support  would  be  needed  if 
participants  were  geographically  dispersed 
and  there  was  no  tech  coach  to  stand  over 
their  shoulder? 

In  the  end,  the  argument  could  be  made 
that  the  low  numbers  for  support  indicate 
the  correct  amount  was  provided  for  an 
onsite  event,  since  eventually  the  model 
calls  for  the  skills  to  be  learned  either 
virtually  or  from  a tutorial.  Perhaps  the  low 
numbers  were  a good  thing  because  if  there 
had  been  too  much  hand  holding,  it  would 
not  have  been  replicable  in  a virtual 
environment 

In  conclusion,  providing  the  right  amount  of 
support  for  live  and  virtual  events  can  be  a 
complex  at  a minimum.  Deploying  a virtual 
environment  for  training  can  require 
providing  a replicable  model  for  support  that 
addresses  a number  of  skill  levels  and 
learning  styles.  Numbers  can  be  deceiving: 
high  agreement  with  a support  question 
could  mean  that  the  level  of  support  cannot 
be  replicated  in  a distributed  environment. 

This  delicate  balance  is  a necessary  one  to 
achieve  to  avoid  creating  a barrier  with  the 
virtual  technology  rather  than  a tool. 
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5.8  A Systematic  Approach  for  Engagement  Analysis  under  Multitasking 
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Abstract.  An  overload  condition  can  lead  to  high  stress  for  an  operator  and  further  cause  substantial  drops  in  performance.  On  tt 
other  extreme,  in  automated  systems,  an  operator  may  become  underloaded;  in  which  case,  it  is  difficult  for  the  operator  to  mainta 
sustained  attention.  When  an  unexpected  event  occurs,  either  internal  or  external  to  the  automated  system,  a disengaged  ope  rat 
may  neglect,  misunderstand,  or  respond  slowly/inappropriately  to  the  situation.  In  this  paper,  we  discuss  a systematic  approach 
monitor  for  extremes  of  cognitive  workload  and  engagement  in  multitasking  environments.  Inferences  of  cognitive  workload  ar 
engagement  are  based  on  subjective  evaluations,  objective  performance  measures,  physiological  signals,  and  task  analysis  result 
The  systematic  approach  developed  in  this  paper  aggregates  these  types  of  information  collected  under  the  multitaskir 
environment  and  can  provide  a real-time  assessment  of  engagement. 


1.0  INTRODUCTION 

Human  operators  play  an  important  role  in 
aviation  and  other  safety  critical  missions.  In 
existing  aviation  systems,  the  Operator 
Functional  State  (OFS)  is  usually  not 
monitored  and  remediation  is  not 
implemented.  In  practice,  two  types  of 
hazardous  states  of  awareness  are  likely  to 
lead  to  human  errors  [1]:  a stress  state  due 
to  high  cognitive  workload  (we  do  not 
consider  physical  workload  in  this  research) 
or  a complacent/bored  state  in  extremely 
low  workload  situations  for  a prolonged 
period  of  time  [2],  It  has  been  found  in 
existing  research  that  proper  assessment  of 
the  cognitive  workload  and  appropriate  task 
mitigation  in  overload  conditions  offers 
potential  to  improve  mission  effectiveness 
and  aviation  safety  [3]-[5], 

On  the  other  hand,  the  disengaged  state 
developed  in  low  workload  conditions  has 
not  received  equal  attention. 
Disengagement  is  usually  accompanied  by 
poor  situational  awareness,  which  can  lead 
to  severe  consequences  in  the  multi-tasking 


aviation  domain.  This  is  especially  true  in 
typical  commercial  flight  scenario,  whic 
has  periods  of  high  workload  during  pre 
flight  preparations,  takeoff  and  landing  wit 
long  periods  of  very  low  workload  as  th 
pilot  cruises  enroute  toward  the  destinatio 
with  the  aircraft  on  autopilot.  Pilots  ca 
easily  get  disengaged  during  the  enrout 
phase  as  they  may  be  less  attentive  unde 
low  workload.  When  unexpected  even! 
occur,  the  disengagement  in  the  tasks  bein 
performed  could  lead  to  operational  errors 
Such  events  could  include  unexpecte 
changes  in  weather  (turbulence,  fc 
example),  equipment  failure/malfunctio 
(such  as  hydraulic  pump  failure)  or  potentu 
collisions  with  other  aircraft. 

Therefore,  the  primary  focus  of  thi 
research  is  to  provide  a real-tim 
engagement/disengagement  assessmer 
mechanism.  For  this  purpose,  we  will  sta 
with  a study  of  the  relationship  amon 
workload,  engagement  and  performanc 
and  identify  the  causes  of  low  engagemer 
status  (low  workload)  and  its  effeci 
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(impaired  performance).  To  better  train  an 
engagement  assessment  model,  we  will 
design  a mechanism  to  identify  the 
engagement  ground  truth  based  on  different 
sources  of  information  (performance 
measures,  subjective  evaluation, 
physiological  signals,  and  task/workload 
analysis),  followed  by  a committee  machine- 
based  real-time  assessment  model 
technique,  and  demonstrate  the  concept 
with  a hypothesized  dataset. 

The  remainder  of  the  paper  is  organized  as 
follows.  Section  2 describes  the  relationship 
among  workload,  engagement,  and 
performance.  Section  3 presents  a 
mechanism  to  determine  the  ground  truth 
for  engagement  modeling.  Section  4 
describes  an  enhanced  committee  machine- 
based  real-time  engagement  assessment 
model.  Section  5 shows  preliminary 
simulation  results.  Section  6 concludes  the 
paper. 

2.0WORKLOAD,  ENGAGEMENT,  AND 
PERFORMANCE 

The  relationship  between  task  engagement 
and  performance  has  been  an  active  area  of 
research  for  over  20  years  [6].  Researchers 
describe  a state  of  high  performance  called 
“flow,”  which  occurs  when  people  are 
performing  challenging  tasks  for  which  they 
have  high  skill.  When  a person’s  skill 
exceeds  the  task  challenge,  it  is  very  likely 
he/she  may  become  bored  and  disengaged. 
Performance  can  suffer  if  the  imposed 
workload  is  greater  than  the  resource  that 
an  operator  can  afford. 

Workload  and  engagement  are  closely 
related.  An  optimal  workload,  one  in  which 
an  operator  performs  challenging  tasks 
within  his  or  her  abilities,  leads  to  high 
levels  of  engagement  and,  accordingly,  high 
levels  of  performance.  If  workload  exceeds 
an  operator’s  capacity,  he/she  will  be 
overloaded  and  the  performance  will  drop 
eventually.  By  the  same  token,  if  an 
operator  is  under  a low  workload  condition 
for  a prolonged  period  of  time,  he/she  will 
usually  drift  into  a disengaged  state  and  the 


performance  will  accordingly  decrease  after 
a certain  amount  of  time,  which  will  be  the 
focus  of  this  paper. 

Therefore,  performance  can  usually  be 
affected  by  both  workload  and  engagement. 
We  define  performance  as  a function  of 
imposed  workload  {WLimfx^sed),  workload 
capacity  (WLc),  engagement  (£),  and 
efficiency  (£ff).  These  terms  are  defined  as 
follows; 

• Imposed  workload  is  typically  what  is 
provided  to  the  operator  and  consists  of 
what  objectives  need  to  be  met,  a period 
of  performance  (e  g.,  a deadline  or  a 
length  of  time  an  activity  must  continue), 
criteria  for  success  or  quality  (i.e.,  how 
the  work  may  be  evaluated),  and  other 
constraints  that  apply  (e.g.,  what 
resources  or  people  a person  has 
available  or  whether  a failure  occurs  in 
the  system  and  how  an  operator  is 
qualified). 

• Workload  capacity  of  an  individual  can 
change  due  to  physical  fitness  (sleep 
loss,  sickness,  etc.)  or  training. 

• Engagement  is  how  much  attention  an 
operator  puts  in  a task. 

• Efficiency  is  usually  determined  by  how 
efficiently  he/she  accomplishes  a task. 

The  ratio  between  the  imposed  workload 
and  the  workload  capacity  basically 
determines  whether  an  operator  is  either 
overloaded  or  underloaded: 

(Equation  1) 

If  a is  beyond  1 , the  task  requirements  are 
greater  than  the  person’s  processing 
capacity,  and  his  or  her  workload  is  high; 
whereas  if  a is  at  or  around  1 , workload  is 
appropriate  for  the  worker.  However,  if  a is 
well  below  1 (for  example  < 0.5),  workload 
is  too  low  for  the  operator. 

With  the  terms  defined,  performance  can  be 
derived  based  on  the  difference  between 
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the  imposed  workload  and  the  effective  & 
engaged  processing  workload  {WLee), 
which  is  determined  by  the  capacity, 
engagement  and  efficiency: 

WLee  = WLc*E*E„  (Equation  2) 

If  the  imposed  workload  WLimposed  is  greater 
than  the  effective  and  engaged  processing 
workload  WLee,  the  performance  can  be  low 
due  to  the  insufficient  instantaneous 
processing  power  to  meet  the  task 
requirements;  otherwise,  performance  can 
be  satisfactory. 

Although  performance  can  be  affected  by 
these  four  factors  (capacity,  engagement, 
efficiency,  and  imposed  workload);  in 
practice,  real-time  performance  variation  is 
not  likely  due  to  efficiency  changes  since 
experienced  operators  generally  use 
efficient  strategies.  Also,  there  is  a plethora 
of  research  on  workload  capacity  (such  as 
fatigue)  and  overload  conditions.  Therefore, 
in  this  research,  we  only  focus  on  the  study 
of  engagement  under  low  workload  and  task 
performance  is  used  as  an  indicator  of  the 
engagement  state. 

3.0  ENGAGEMENT  GROUND  TRUTH 

Before  an  engagement  assessment  model 
can  be  deployed,  it  needs  to  be  trained 
based  on  the  engagement  ground  truth  and 
corresponding  input  information 
(physiological  signals,  performance,  and 
others).  However,  there  does  not  exist  a 
sensor  to  provide  engagement  ground  truth; 
instead,  engagement  ground  truth  is  often 
derived  based  on  all  available  information, 
including  workload  analysis,  subjective 
evaluations,  performance  measures,  and 
physiological  measures.  In  each  of  the  four 
types  of  information,  different  characteristics 
exist. 

First,  as  discussed  before,  disengagement 
usually  occurs  under  low  workload 
conditions,  and  therefore,  workload 
information  shall  be  utilized  to  assess  the 
engagement  state.  Cognitive  workload 
analysis  under  multi-tasking  environments 
can  provide  a direct  and  continuous 


measure  of  the  tasks  being  performed.  We 
can  hypothesize  that  an  operator  in  a low 
workload  condition  for  a prolonged  period  of 
time  may  become  disengaged.  It  is 
important  to  note  that  the  cognitive  workload 
analysis  is  an  objective  measure  of  the  task 
requirements  and  it  cannot  account  for 
many  other  factors,  such  as  individual 
variations  and  environmental  conditions. 

Second,  subjective  evaluation  during 
missions  is  not  suitable  for  identifying 
engagement  ground  truth  (real-time 
assessment  also),  since  in-mission 
subjective  evaluation  requires  interaction 
with  the  operator  being  monitored,  which 
would  affect  the  operator’s  state  artificially. 
Instead,  a subjective  evaluation  after  the 
mission  can  be  directly  utilized  to  assess 
the  engagement  state  when  he/she  was 
performing  the  task.  For  example,  recall  of 
task/scenario  events  and  a question  of  “did 
you  feel  you  were  engaged  while  doing  the 
task?”  can  provide  the  information  whether 
the  operator  was  engaged  during  the  task. 

Third,  performance  measures  can  reflect 
the  effects  of  individual  characteristics  and 
contextual  information  (including  system 
setup,  hardware/software  issues,  etc.)  on 
engagement.  Similar  to  in-mission 
subjective  evaluation,  intrusive  performance 
measures,  such  as  the  fatigue-related 
Psychomotor  Vigilance  Test  (PVT),  a 
sustained  attention  task  requiring  subject 
response  to  an  isolated  target,  is  also  not 
suitable  for  engagement  assessment. 
However,  non-intrusive  performance 
measures,  such  as  reaction  time  to  Air 
Traffic  Controller  (ATC)  communications, 
can  be  used  as  a good  engagement 
indicator;  slow  response  times,  requests  for 
clarification,  and  errors  in  readback  could  be 
associated  with  a disengaged  state. 

Finally,  selected  physiological  measures 
can  indicate  when  an  operator  is  in  a 
disengaged  state.  For  example,  a 
disengaged  pilot  during  the  enroute  phase 
may  have  longer  fixation  durations  and/or 
increased  saccade  length  due  to  decreased 
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workload  [7],  Other  physiological  measures 
include  EEG  readings,  facial  analysis,  body 
posture,  pressure  readings  from  a pressure 
sensitive  mouse  or  other  equipment  (to 
measure  stress  levels)  and  use  of  a 
wristband  to  measure  stress  as  well. 
Previous  research  using  facial  analysis, 
posture,  the  mouse  and  the  wristband 
shows  that  these  physiological  measures 
can  correlate  up  to  .78  with  subject  reports 
of  engagement  during  a task  (every  five 
minutes  [8],  Adding  EEG  could  potentially 
strengthen  this  measure  and  by  adding  eye 
tracking,  we  can  get  extra  evidence  as  to 
whether  the  pilot  is  actually  attending  to  the 
unexpected  event. 

Based  on  the  characteristics  of  these  kinds 
of  information,  the  ground  truth  finding 
procedure  can  be  described  as  follows: 

1)  Analyze  cognitive  workload  for  the 
task(s)  being  performed: 

Outcome:  WLimposed',  a continuous 

measure  of  the  cognitive  workload  (0- 
100)  induced  by  the  tasks  being 
performed. 

2)  Performance  evaluation:  we  will  derive  a 

performance-based  engagement  score 
based  on  collected  performance 
measures  (continuous  and/or  discrete; 
for  example,  a relatively  long  reaction 
time  to  the  ATC  communications  would 
probably  indicate  a disengaged  state).  If 
only  discrete  performance  measures  are 
available,  interpolation  can  be  used  to 
derive  the  performance- based 

engagement  scores  in  between. 

Outcome:  a performance-based 

engagement  score  (0-100). 

3)  Fusion  of  cognitive  task  analysis  results 

and  performance-based  engagement 
scores.  Different  fusion  techniques  can 
be  adopted  to  combine  the  cognitive 
task  analysis  and  performance 

evaluation  results.  A simple  example  is 
a set  of  fuzzy  rules  to  fuse  the  imposed 


cognitive  workload  and  the  performance 
loss,  such  as 

a.  If  WLimposed  is  high,  performance 
is  high,  the  engagement  score  is 
high; 

b.  If  WLimposed  is  high,  performance 
is  low,  the  engagement  score  is 
medium;  and 

c.  If  WLimposed  IS  low  for  Certain 
duration  and  performance-based 
engagement  score  is  low, 
engagement  is  low. 

Outcome:  Eo;  a continuous  objective 
measure  of  engagement  (0-100) 

4)  Utilize  critical  physiological  signals  and 
features  to  indicate  a disengaged  state. 
Please  note  that  this  step  only  identifies 
potential  disengaged  state  indicators 
during  a mission,  such  as  yawning  and 
long  fixation  duration. 

Outcome:  a discrete  objective 

engagement  score  (Edo,  0-100) 
representing  how  well  an  operator  is 
engaged  in  a task  based  on  critical 
disengaged  physiological  signs. 

5)  Analyze  subjective  evaluation  results. 
There  may  be  more  than  one  subjective 
evaluation  measures.  In  this  case,  we 
will  first  fuse  these  different  discrete 
subjective  evaluation  results. 

Outcome:  Eds;  a discrete  subjective 
measure  of  the  engagement  (0-100) 

6)  Calibrate  the  continuous  objective 
engagement  (Eo)  with  the  subjective 
engagement  assessment  (Es  = {Edo, 
Eds})-  If  at  the  same  time  instant,  both 
Edo  and  Eds  are  available,  they  will  first 
be  combined  before  calibration 
(weighted  sum,  for  example). 

Outcome:  Engagement  (E);  a 

continuous  overall  measure  (0-100) 
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E=  E,  + weight  * (£,  - * 

(Equation  3) 


With  the  above  procedures,  a continuous 
engagement  measure  can  be  derived 
considering  different  sources  of  information. 


where  to  is  the  time  instant,  when  the 
subjective  engagement  result  is  available; 
'freight  (0~1)  is  the  confidence  level  of  the 
subjective  assessment  result  relative  to  the 
objective  assessment  result;  and  F is  a 
forgetting  factor  that  controls  how  long  the 
subjective  evaluation  assessment  result  will 
impact  the  final  outcome. 

The  goal  is  to  use  the  subjective 
assessment  to  enhance  the  overall 
engagement  assessment  accuracy.  In  real 
operations,  objective  assessment  results 
can  be  obtained  frequently  depending  on 
the  computer  processing  speed.  On  the 
other  hand,  subjective  assessment  results 
can  only  be  available  at  longer  time 
intervals  because  the  assessment  process 
is  intrusive.  In  other  words,  the  overall 
engagement  mechanism  is  multi-variate. 
The  basic  idea  behind  Equation  3 is  as 
follows.  In  the  absence  of  subjective 
assessment  results,  we  will  solely  rely  on 
the  objective  results,  which  are  more 
frequently  available.  When  the  subject 
assessment  is  available  at  to,  we  will  use 
Equation  3 to  calibrate  the  final  engagement 
assessment  result.  Depending  on  our 
confidence  level  of  subjective  assessment, 
we  will  choose  a proper  weight  or  the  weight 
can  be  trained  with  the  training  data.  If 
weight  is  0,  £ = Eo,  which  means  that  the 
subjective  assessment  of  engagement  is 
totally  discounted  and  the  engagement 
result  fully  relies  on  the  objective 
measurements;  On  the  other  hand,  if  the 
weight  at  the  other  extreme  of  1,  £ = £s, 
meaning  the  engagement  is  determined 
solely  by  subjective  assessment  at  the  time 
the  subjective  assessment  is  introduced. 
Even  with  a confidence  level  defined 
(weight),  to  reduce  the  bias  from  the 
subjective  assessment,  we  introduce  an 
exponential  term  in  Equation  3,  with  which 
the  bias  is  exponentially  discounted 
(controlled  by  the  forgetting  factor  F). 


4.0 REAL-TIME  ENGAGEMENT 
ASSESSMENT  MODEL 

The  basic  procedure  for  real-time  OFS 
assessment  is  shown  in  Figure  1. 
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Figure  1:  Real-time  OFS  assessment 
procedure 

It  is  similar  to  the  engagement  ground  truth 
finding  procedure  described  in  Section  3. 
However,  in  real-time  aviation  applications, 
we  cannot  rely  on  manual  selection  of 
physiological  features  that  can  indicate  a 
disengaged  state,  such  as  eye  fixation 
duration  and  Heart  Rate  Variability  (HRV). 
Instead,  the  physiological  signals  are  being 
continuously  monitored  and  the  variation  of 
engagement  is  automatically  determined  by 
a model  trained  with  the  physiological 
features  and  the  identified  engagement 
ground  truth  using  a set  of  training  data. 
The  output  of  the  real-time  assessment 
model  is  an  objective  assessment  of 
engagement. 

Two  sources  of  discrete  information  are 
utilized  to  “calibrate”  the  objective 
engagement  assessment:  the  imposed 

workload  (especially  low  workload  for  a 
prolonged  period  of  time)  and  non-intrusive 
performance  measures  (mostly  discrete  in 
commercial  aviation  applications,  such  as 
reaction  time  and  errors  associated  with 
ATC  communications).  Fuzzy  rules  similar 
to  those  used  in  ground  truth  finding  can  be 
applied  to  derive  a discrete-time  evaluation 
of  engaged/disengaged  state. 
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Again,  the  final  engagement  assessment  is 
based  on  a calibration  of  the  objective 
evaluation  using  the  difference  between 
objective  and  subjective  evaluation  results 
modulated  by  a forgetting  factor. 

It  is  worth  noting  that  an  enhanced 
committee  machine  method  has  been 
proposed  by  the  authors  in  [9],  In  this 
research,  we  will  apply  the  same  technique 
to  build  the  objective/continuous 
engagement  assessment  model.  The 
enhanced  committee  machine  method  is 
able  to  address  large  OFS  individual 
variations  by  selecting  the  committee 
members  and  features  that  are  the  most 
sensitive  to  the  OFS  of  each  individual.  The 
method  has  been  successfully  verified  and 
validated  with  a driving  test  data  set  with  a 
mean  squared  error  of  OFS  estimation 
being  significantly  decreased  (by  around 
20%)  comparing  to  that  without 
individualization  [9]. 

5.0SIMULATION  RESULTS 

In  this  paper,  we  generated  a simulation 
dataset  based  on  the  flight  information  of 
AAL1238  on  05/12/2010,  from  Seattle  to 
Chicago  O’Hare,  to  illustrate  the  developed 
engagement  assessment  method.  The 
basic  flight  information  was  extracted  from 
the  link  from  [10].  The  altitude  change  along 
the  flight  is  shown  in  Figure  2. 


Figure  2;  Flight  AAL  1238  on  05/10/2010: 
altitude  vs.  time 

Several  assumptions  were  made  to  adopt 
this  flight  for  the  proof-of-concept  purpose: 


1)  The  imposed  workload  was 
constantly  high  during  take-off  and 
landing 

2)  The  imposed  workload  was  low 
during  cruising  (altitude  above  30k 
ft). 

3)  Additional  assumptions: 

a.  Both  an  engaged  pilot  and  a 
disengaged  pilot  reported 
disengaged  from  1:45PM  to 
2:15PM. 

b.  Reaction  time  of  an  engaged 
pilot  is  shorter;  four  reaction 
times  are  assumed:  1,  2,  2.4,  8, 
and  2 seconds  vs.  3.2,  6.5,  8.3, 
5.5  seconds  for  a disengaged 
pilot. 

c.  Simulated  fixation  duration  is 
used  as  a performance  indicator 
for  engagement  assessment 

Figure  3 shows  the  imposed  workload,  low 
workload  more  than  2 minutes,  and 
subjective  evaluation  of  disengagement. 


Figure  3:  Workload,  low  workload  more  than 
2 minutes,  and  subjective  evaluation  of 
disengagement 

For  an  engaged  pilot,  the  fixation  duration  is 
usually  smaller  than  that  of  a disengaged 
pilot,  who  may  be  in  a state  of  day  dreaming 
or  high  fatigue.  Also,  the  reaction  time  of  the 
engaged  pilot  is  usually  shorter  than  that  of 
a disengaged  pilot.  As  an  example.  Figure  4 
and  Figure  5 show  the  physiological  signals 
(normalized  fixation  duration)  and  the 
physiological  indicators  of  a disengaged 
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pilot  and  an  engaged  pilot,  respectively.  It 
can  be  seen  that  a few  more  physiological 
indicators  of  disengagement  are  found  for  a 
disengaged  pilot  (shown  in  black). 


□san^ged  Riot 


Figure  4:  Physiological  signals  and 
physiological  indicators  of  a disengaged 
pilot 


Figure  5:  Physiological  signals  and 
physiological  indicators  of  an  engaged  pilot 


By  combining  with  the  performance 
indicators  derived  from  reaction  time,  we 
derived  the  engagement  score  for  an 
engaged  pilot  vs.  a disengaged  pilot  with 
the  method  described  in  this  paper.  An 
example  plot  of  the  final  engagement  scores 
is  shown  in  Figure  6. 


i 


Figure  6:  Disengagement  score  of  an  engaged 
pilot  vs.  a disengaged  pilot 

Clearly,  from  the  example  shown  above,  we 
can  see  that  engagement/disengagement 
state  assessment  cannot  solely  rely  on  the 
subjective  evaluation  results,  and  low 
workload  does  not  necessarily  Indicate  a 
disengaged  state.  Physiological  Indicators 
of  disengagement  and  selected  non- 
intrusive  performance  measures,  although 
discrete,  can  usually  provide  a better 
estimation  of  disengagement. 

6.0CONCLUSION 

In  this  research,  we  have  successfully 
developed  a systematic  approach  for 
engagement  assessment.  The  approach  is 
based  on  a thorough  understanding  of  the 
relationship  among  performance,  workload, 
and  engagement.  To  train  a real-time 
engagement  assessment  model,  we  have 
developed  a systematic  approach  to  identify 
the  engagement  ground  truth  based  on 
different  sources  of  information:  workload, 
non-intrusive  performance  measures, 
physiological  indicators,  and  subjective 
evaluations.  The  ground  truth  identification 
approach  was  demonstrated  using  a 
simulation  data  derived  from  the  AAL1238 
flight  on  05/12/2010. 

One  of  the  future  tasks  is  to  further 
implement  the  proposed  real-time 
assessment  technique  on  the  enhanced 
committee  machine-based  model  and  is  to 
verify  and  validate  its  performance  with 
experimental  data.  Another  important  task  is 
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to  continue  addressing  the  individual 
variation  in  the  enhanced  committee 
machine-based  real-time  engagement 
assessment  model. 
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Background 

• Definition  of  Engagement 

Attentional  state  of  an  operator  during  execution  of  a given  task 

• Two  types  of  hazardous  states  of  awareness  that  may  lead  to 
human  errors: 

- A stress  state  due  to  high  cognitive  workioad 

- A disengaged  state  due  to  low  workload,  poor  physical  fitness,  etc. 


• An  operator  in  a low  engagement  level  may  neglect, 
misunderstand,  or  respond  slowly/inappropriately  to 
unexpected  events 

- For  example,  commercial  flight  pilots  can  easily  get  disengaged  during 
enroute  phase  under  low  workload 


Definitions 


• Imposed  workload; 

- The  workload  that  is  assigned  to  successfully  accomplish  the 
given  task. 


• Workload  capacity;  WL^ 

- The  maximum  workload  an  operator  can  handle  in  the  task 


• Effective  workload; 

- The  workload  that  the  operator  actually  delivered  toward  the 
given  task 
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Engagement  and  Performance 


• WL^^  = WL,-*E*Eff 

where  E denotes  the  engagement  {a  score  from  0 to  100), 
the  effective  workload,  H/tj-the  workload  capacity,  and 
stands  for  efficiency,  which  is  determined  by  how  efficiently 
the  operator  accomplishes  a task 

• Performance: 

.^1  1 if  succeed 

[WLi,,,  / otherwise,  failed 


Note  that 


WL,-E, 


WL, 


*impo:ied 


E 


Engagement  Analysis 


• Engagement  is  determined  by  multiple  factors 

WL 

- Task  challenging  level,  “ = — 

- Physical  fitness,  e.g.,  sleep  loss, "‘sickness 

- Environmental  conditions 


• We  focus  on  the  relationship  between  the 
engagement  and  imposed  workload 


E[a)  = f^(a) 

OC  denotes  the  challenging  lever, 
and  B denotes  other  factors  that 
affect  engagement 
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Engagement 


Engagement  Assessment 


• Awareness  evaluation 

- e.g.,  reaction  time  to  the  Air  Traffic  Controiier  (ATC) 
communications 

- Continuous  and/or  discrete 

• Physiological  signals  and  features 

- e.g.,  EEC  readings,  eye  fixation  durations,  heart  rate  variabiiity 
(l-fRV),  etc. 

- Continuous 

• Subjective  evaluation 

- e.g.,  fatigue-reiated  Psychomotor  Vigiiance  Test  (PVT) 

- Discrete 


Overall  Engagement  Assessment 

• Fusion  of  continuous  objective  engagement  measure 
Eo(t)  with  discrete  subjective  measure  Ejtj) 

E{t)  = w^E^{t)  + w^g{E^,t) 

• is  a prediction 
function  based  on  the 
previous  discrete  measures 

• giE,,t)  can  be  estimated 
using  a data-driven  or 
parametric  modei  method 

1 in  the  prediction  theory 
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Real-time  Engagement  Assessment 

Model 


Physiological 

Features 


Objective/Continuous 
Engagement 
Assessment  Model 


^^impo$ed  ^ 

Discrete  Non- 

intrusive 

performance 


Fuzzy  Logic 


> 


Objective  tv 

Engagement 

Assessment 


Fusion 


Discrete-time 

Engagement 

Assessment 


Engagement  = (t)  + w^g(E^ , 0 


Enhanced  Committee  Machine  for  Objective/ 
Continuous  Engagement  Assessment 

Enhanced  Committee  Machine  [1] 

• Feature  selection  + Bootstrapping 

• Advanced  Feature  Selection:  Piecewise  Linear 
Orthogonal  Floating  Search  (PLOFS)  [2] 

- Computationally  Efficient 

• Performance:  \Wrapper  type 

• Speed:  Filter  type 

- Select  from  Original  Features 

• No  transformation  needed  like  PCA 

- Consider  interactions  among  features 

- Generate  a list  of  combinations 

• Bootstrapping:  resubmission 
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Enhanced  Connnnittee  Machine  Architecture 


Committee  1: 
Trained  with 
different  initial 
weights 

Committee  2: 
Bagging 
combined  with 
feature 
selection 

Committee  2 
is  better 


Simulations  Results 

AAL1238  on  05/12/2010,  from  Seattle  to  Chicago  O’Hare  [3] 


Time  Latigude  LongjtuldeHeading  Direction  KTS  MPH  Feat  Rate  Location 


12;  16PM 

47.42 

^122.31 

178* 

South 

176 

195 

1,300 

Seatlfe  CenEer 

12:17PM 

47.35 

'122.31 

147^ 

Soulheast 

209 

241 

3,300 

2.226  Seanie  Center 

l2:iaPM 

47.31 

-132  ze 

64* 

tsTorthaasl 

237 

273 

6.400 

3.006  Statue  Center 

l2:iePM 

47.34 

'12216 

SS* 

Nofthaasl 

195 

224 

9,900 

2,326  Seallie  Cenler 

12:20PM 

47.38 

'122.07 

59* 

Northeast 

246 

233 

1 1 .500 

1,326  Seellle  CenEer 

12:21  PM 

47  43 

-121.94 

91* 

EagE 

324 

373 

13,200 

1,856  Seallie  Cenler 

12:22PM 

47.43 

'121.01 

92* 

EasE 

339 

390 

15.306 

2.286  Seattle  Center 

12;23PM 

47.43 

-121.54 

90* 

EagB 

357 

411 

17,800 

2,346  Seattle  Center 

Flight  AAL1238  on  5/12/2010 
altitude  vs.  time 
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Simulation  Results  (cont') 


•Simulated  datasets 


Workload,  low  workload  more  than  2 minutes, 
and  subjective  evaluation  of  disengagement 


Simulation  Results  (cont') 


Normalized  eye  fixation  duration  Normalized  eye  fixation  duration 
and  physiological  indicators  and  physiological  indicators 
of  a disengaged  pilot  of  an  engaged  pilot 
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Simulation  Results  (cont') 


Engagement  score  of 
an  engaged  pilot  vs. 
a disengaged  pilot 


•Engagement  assessment  cannot  solely  rely  on  the  subjective 
evaluation 

•Low  workload  does  not  necessarily  indicate  a disengaged  state 
•Physiological  indicators  and  selected  non-intrusive  awareness 
measures  usually  provide  a better  estimation 


Summary 

• Established  the  relationship  among  performance, 
workload  and  engagement 

• Developed  a systematic  approach  for  engagement 
assessment 

• Demonstrated  the  feasibility  of  the  proposed 
approach  with  simulations 

• Future  work: 

- Verify  and  validate  the  proposed  system  with  experimental  data 

- Address  the  individual  variation  in  the  enhanced  committee 
machine-based  real-time  engagement  assessment  model 
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Abstract.  The  Next  Generation  Air  Transportation  System  will  introduce  new,  advanced  sensor  technologies  into  the  cockpit  that 
must  convey  a large  number  of  potentially  complex  alerts.  Our  work  focuses  on  the  challenges  associated  with  prioritizing  aircraft 
sensor  alerts  in  a quick  and  efficient  manner,  essentially  determining  when  and  how  to  alert  the  pitot  This  ' aterl  decision"  becomes 
very  diflicutt  in  NextGen  due  to  the  following  challenges:  1)  the  increasing  number  of  potential  hazards,  2)  the  uncertainty  associated 
with  the  state  of  potential  hazards  as  well  as  pilot  state,  and  3)  the  limited  time  to  make  safety-critical  decisions.  In  this  paper,  we 
focus  on  pilot  state  and  present  a model  for  anticipating  duration  and  quality  of  pilot  behavior,  for  use  in  a larger  system  which 
issues  aircraft  alerts.  We  estimate  pilot  workload,  which  we  model  as  being  dependent  on  factors  including  mental  effort,  task 
demands,  and  task  performance  We  perform  a mathematically  rigorous  analysis  of  the  model  and  resulting  alerting  plans.  We 
simulate  the  model  in  software  and  present  simulated  results  with  respect  to  manipulation  of  the  pilot  measures. 


1.0  INTRODUCTION 

The  introduction  of  Next  Generation  Air 
Transportation  System  (NextGen) 
technologies  into  the  cockpit  is  expected  to 
dramatically  increase  the  responsibilities  of 
the  pilot  (JPDO.  2007).  In  particular, 
additional  aircraft  alerting  systems  will  be 
introduced,  and  the  pilot  will  need  to  adapt 
to  the  increase  in  both  the  number  and 
types  of  possible  hazard  alerts.  NextGen 
will  also  introduce  additional  automation 
technologies  into  the  cockpit,  capable  of 
addressing  alerts  with  minimal  assistance 
from  the  human  pilot.  However,  interfacing 
these  technologies  with  both  the  human 
pilot  as  well  as  the  large  number  of  possible 
hazard  alerts  introduces  a set  of  research 
challenges,  including  how  to  prioritize  the 
alerts,  how  to  plan  the  interaction  between 
human  and  automation  to  address  the 
prioritized  hazards,  and  how  to  adjust  the 
plan  according  to  the  state  and  capabilities 
of  the  pilot. 

In  order  to  address  these  challenges, 
Aptima,  Inc.,  in  cooperation  with  SAIC  and 
under  the  supervision  of  NASA,  is 
developing  a NextGen  aircraft  system  called 
ALARMS  (ALerting  And  Reasoning 
Management  System).  The  system  has  four 
parts:  Bayesian  reasoning  to  determine  type 
and  priority  of  existing  hazards,  a Time 


Dependent  Markov  Decision  Process 
(TMDP)-based  planner  to  address  the 
hazards  in  a timely  fashion,  a human 
performance  estimator  to  inform  the  planner 
as  to  the  state  and  capabilities  of  the  pilot, 
and  an  interface  to  inform  the  pilot  of  alerts 
in  the  best  possible  manner. 

In  this  paper,  we  concern  ourselves  with  the 
third  item,  how  to  estimate  pilot  state  and 
capabilities  in  order  to  inform  a plan  for  the 
human  and  automation  to  cooperate. 
Defining  a plan  to  cooperate  has  been  the 
subject  of  empirical  research  (Galster,  2003; 
Galster  & Parasuraman,  2003; 

Parasuraman  & Riley;  Parasuraman, 
Sheridan,  & Wickens,  2000).  One  approach 
is  to  describe  a level  of  automation  in  a 
continuum  between  fully  automated  hazard 
response,  to  fully  manual  hazard  response 
(Wickens,  Mavor,  Parasuraman,  & McGee, 
1998;  Sheridan  & Verplank,  1978).  An 
extended  method,  proposed  by 
Parasuraman  et  al.  models  human 
information  processing  in  four  stages: 
Sensory  Processing.  Perception/Working 
Memory,  Decision  Making,  and  Response 
Selection  (Parasuraman,  Sheridan,  & 
Wickens,  2000).  Differing  circumstances 
may  call  for  differing  stages  of  automation. 
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Figure  1:  ALARMS  System  Diagram 


But  which  stage  of  automation  is 
appropriate  may  depend  on  several 
variables,  including  the  characteristics  of  the 
hazards,  as  well  as  “pilot  state,”  the  ability  of 
the  pilot  to  perform  under  a given  stage  of 
automation.  In  this  paper  we  introduce  a 
model  for  estimating  pilot  state  on  the 
aircraft,  for  the  purposes  of  informing  the 
hazard  alerting  system.  The  model 
leverages  existing  literature  on  pilot 
workload  as  well  as  pilot  performance,  The 
model  in  this  paper  uses  three  stages  of 
automation,  each  of  which  corresponds  to  a 
flight  deck  display.  In  Stage  1 of 
automation,  increasing  workload  will  greatly 
decrease  quality  and  increase  duration  for 
high  workload  conditions  as  compared  to 
low  workload  conditions.  In  Stage  2, 
increasing  workload  will  decrease  quality 
and  increase  duration.  In  Stage  3,  the 
effects  will  be  negligible. 

The  outline  for  the  rest  of  the  paper  follows; 
First,  we  introduce  the  ALARMS  system 
architecture  and  briefly  outline  its 
components,  including  a hazard  state 
estimation  module  and  a planning  module 
for  stages  of  automation.  Next,  we  outline 
the  pilot  state  estimation  module,  which 
estimates  pilot  workload.  We  show  how  the 
pilot  state  can  be  used  as  input  for  the 
ALARMS  planning  module.  Finally,  we  show 
modeled  results  for  how  changes  of  pilot 
state  will  change  temporal  plans  for  stages 
of  automation,  and  conclude  with  a 
summary  and  a discussion  of  future  steps. 


2.0  BODY:  ALARMS  SYSTEM 
ARCHITECTURE 


The  ALARMS  system  architecture  is  shown 
in  Figure  1.  Proceeding  from  left  to  right, 
multiple  hazards  exist  in  the  environment 
and  result  in  alerts  on  the  flight  deck.  The 
hazards  themselves,  and  the  sensor  alerting 
systems,  are  external  to  the  ALARMS 
system.  The  alerts  are  issued  to  the 
ALARMS  Integrated  System  User  Model. 
The  system  alerts  are  treated  as  evidence, 
and  from  this  evidence  the  ALARMS  state 
estimation  module  estimates  the  actual 
hazards.  A more  detailed  description  of  this 
estimate  can  be  found  in  a companion 
paper  to  this  work  (9).  To  summarize,  a 
Dynamic  Bayesian  Network  (DBN)  is  used. 
DBN’s  have  been  found  in  similar  systems, 
notably  in  medical  diagnosis  (Shwe  & 
Cooper,  1991).  Our  use  of  a DBN  in  the 
ALARMS  system  is  analogous,  if  the  system 
alerts  are  treated  as  "symptoms”  to  estimate 
the  "disease”  of  the  actual  hazard. 

The  hazard  state  is  combined  with  the 
estimated  state  of  the  pilot  (which  we  will 
return  to  in  a moment),  to  form  a complete 
state  estimate.  This  pilot  and  hazard 
estimate  is  fed  into  a Planning  module.  The 
Planning  module  recommends  the  stage  of 
automation  for  the  hazards,  which  is  fed  into 
the  ALARMS  interface  for  display.  The 
result  is  displayed  to  the  pilot. 

In  this  work,  we  focus  on  the  Human 
Performance  module  in  the  diagram.  This 
module  estimates  the  status  of  the  pilot. 
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Figure  2:  Human  Performance  Module 


which  in  turn  is  used  to  estimate  the 
performance  of  the  pilot  at  each  stage  of 
automation.  The  planner  will  then  select  the 
appropriate  stage  of  automation  that 
maximizes  the  effectiveness  of  the 
combined  pilot  and  automation. 

2.1  Human  Performance  Module 


We  model  human  performance  as  shown  in 
Figure  2.  The  Output  from  the  module  is  an 
estimate  of  expected  pilot  performance,  in 
terms  of  duration  and  quality  of  pilot 
handling  of  hazards.  The  Variable  of 
Interest,  Workload,  is  the  key  parameter 
representative  of  pilot  state  that  changes 
over  time  and  directly  impacts  performance. 
The  Mediating  Variable,  Fatigue,  influences 
the  relationship.  Other  variables  of  interest 
(e.g.,  situation  awareness)  or  mediating 
variables  may  be  considered  in  future  work. 

In  environments  with  task  demands, 
workload  affects  the  mental  resources  that  a 
pilot  can  access  to  address  the  demands 
(Wickens  & Hollands,  2000).  Specifically, 
the  effect  can  be  modeled  through  a 
performance  resource  function,  or  PRF 


(Norman  & Bobrow,  1975).  When  cognitive 
resources  are  unavailable  or  unused  for  a 
task,  performance  will  be  diminished.  As 
more  resources  are  dedicated,  performance 
will  improve,  until  the  task  becomes  limited 
by  data  and  not  resources.  When  multiple 
tasks  must  be  accomplished,  such  as  is  the 
case  when  a pilot  must  supervise  multiple 
systems  in  the  cockpit,  resource  limitation 
becomes  an  issue  (Kantowitz  & Casper, 
1988).  The  workload  of  the  pilot  will  define 
the  availability  of  a pilot’s  resources  to 
handle  alerts. 

It  Is  possible  to  assess  workload  as  an 
index,  and  several  criteria  have  been 
specified  to  compute  the  index  (Wickens  & 
Hollands,  2000;  O'Donnell  & Eggemeier, 
1986).  Among  these  criteria:  a satisfactory 
workload  index  is  sensitive  to  changes  in 
task  demands,  diagnoses  the  cause  of 
workload  variation,  is  selective  in  that 
factors  that  do  not  affect  workload  are  not 
included  in  the  index,  is  unobtrusive  in  that 
the  computation  of  the  index  does  not  affect 
workload  itself,  and  is  reliable. 

For  ALARMS,  we  identify  three  factors  that 
predict  workload:  mental  effort,  task 
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demands,  and  ongoing  task  performance. 
We  also  identify  relevant  measures  of  these 
factors  from  the  literature. 

2.1.1  Mental  Effort 

We  follow  the  literature  by  specifying  mental 
effort  as  a contributing  factor  to  workload. 
High  levels  of  performance  can  be  achieved 
under  conditions  of  normal  mental  effort 
while  extremely  high  mental  effort  situations 
tend  to  result  in  decreased  performance. 
Measures  of  mental  effort  include  both 
subjective  and  physiological  measures 
(Veltman,  2001). 

Subjective  information  in  our  model 
includes  potential  measures  such  as  the 
NASA  TLX  scale  (Hart  & Staveland,  1988), 
which  allows  the  operator  to  specify  mental 
demand,  physical  demand,  temporal 
demand,  performance,  effort,  and  frustration 
level.  The  Bedford  Workload  scale  (Roscoe, 
1984),  on  the  other  hand,  is  a decision  tree, 
and  the  leaves  of  the  tree  provide  a 
workload  score  on  a single  dimension. 

Physiological  information  can  also  be 
obtained.  Examples  of  potential  measures 
include  electroencephalography  (EEG)  or 
heart  rate  variability  (HRV).  It  has  been 
shown  that  heart  rate  can  differentiate 
between  phases  of  flight  (which  require 
different  levels  of  mental  effort)  for  pilots 
and  co-pilots  (Bonner  & Wilson.  2002),  even 
when  subjective  measurements  do  not. 

2.1.2  Task  Demands 

In  the  prior  subsection,  mental  effort  is 
described  as  being  necessary  to  accomplish 
tasks.  The  level  of  effort  demanded  will 
depend  on  the  task.  Simple  tasks  will 
require  smaller  amounts  of  resources,  while 
complex  tasks  will  require  a higher  degree 
of  mental  effort.  Measures  of  Task 
Demands  include  both  the  complexity  of 
tasks  and  the  number  of  tasks. 

Task  complexity  can  affect  workload; 
specifically,  complex  tasks  will  result  in  a 
higher  workload.  For  example,  the  landing 
phase  of  flight  produces  higher  workload 
than  the  enroute  phase  (Bonner  & Wilson. 


2002).  As  a second  example,  more 
automated  tasks  consume  fewer  resources 
than  less  automated  ones  (Schneider  & 

Fisk,  1982). 

Number  of  tasks  affects  workload  as  well, 
in  two  ways.  First,  the  presence  of 
additional  tasks  adds  to  workload.  Second, 
there  is  a cost  to  switching  among  tasks 
(Rogers  & Monsell,  1995).  Thus,  the 
contribution  of  tasks  to  workload  exceeds 
the  sum  of  the  tasks  complexities. 

2.1.3  Ongoing  Task  Performance 
Workload  contributes  to  the  model  insofar 
as  it  is  predictive  of  pilot  performance.  Thus, 
a well-accepted  manner  of  estimating 
workload  is  to  examine  performance 
directly.  Potential  measurements  include 
Flight  Technical  Errors,  Navigation 
Errors,  and  Communication  Errors. 

These  errors  can  be  measured  by  the 
ALARMS  system  at  run-time. 

2.2  Interface  to  ALARMS  Planner 

The  goal  of  estimating  pilot  Workload  is  to 
predict  the  quality  and  duration  of  pilot 
actions  so  that  a joint  pilot/automation  plan 
can  be  formed  to  address  hazards.  In  this 
section,  we  describe  the  details  of  the 
interface  to  the  planner.  We  begin  by 
summarizing  the  planner  itself,  as 
introduced  in  (Carlin.  Marecki,  & Schurr, 
2010)  and  adjusted  in  this  paper  to  account 
for  pilot  state. 

2.2.1  ALARMS  TMDP  Planner 

We  model  the  ability  of  the  pilot  and  system 
to  address  hazards  with  a Time-Dependent 
Markov  Decision  Process  (TMDP)  (Boyan_& 
Liftman,  2000).  A TMDP  is  a tuple 
<S,A,P,D,R>,  where  S is  a set  of  states.  A 
represents  a set  of  actions,  P is  a transition 
matrix,  D is  a set  of  probability  density 
functions,  and  P is  a reward.  Assume  a 
finite  set  S of  discrete  states  and  a finite  set 
A of  actions.  When  the  state  s in  S.  and 
action  action  a in  A is  executed,  the  process 
transitions  with  probability  P(s,a,s’)  to  state 
s’  in  S.  The  transition  consumes  t units  of 
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time  with  probability  d(s,a,s’,t),  where 
d(s,a,s’,t)  is  a probability  density  function 
over  f for  a given  s,a,  and  s'.  Similarly,  the 
reward  R(s,a,s’)  depends  on  s,  a,  and  s’. 
Reward  occurs  when  an  action  terminates. 

A deterministic  TMDP  policy  is  a mapping 
S X [0,A]  ->  A where  A (the  deadline)  is  the 
earliest  point  in  time  after  which  all  rewards 
are  zero. 

Let  T be  a set  of  alert  levels  (e.g.  “Nominal” 
(N),  “Advisory”  (A),  “Caution”  (C),  “Warning” 
(W),  or  “Directive”  (D)),  O be  an  ordered  set 
of  hazards,  and  Q be  a set  of  autonomy 
stages.  A TMDP  problem  in  ALARMS  is 
defined  as  follows: 

• States  A state  s in  S is  a mapping 
from  the  hazards  to  their  aleret 
levels.  For  example,  given  three 
hazards,  state  s = <C,N,W>  defines 
that  the  first  hazard  is  at  Caution 
level,  the  second  hazard  is  at 
Nominal  level,  and  the  third  hazard 
is  at  a Warning  level. 

• Actions  The  actions  of  the  ALARMS 
system  represent  the  different  ways 
in  which  the  system  displays  the 
information  about  the  hazards  on  the 
pilot’s  GUI.  Each  component  of  an 
action  represents  a stage  of 
autonomy.  For  example,  given  three 
hazards,  action  a = <1,3,2>  will 
present  three  hazards  at  stages  1 ,3, 
and  2 of  autonomy.  It  is  possible  to 
represent  different  hazards  at  the 
same  stage  of  autonomy  (e.g. 

<2.2, 2>). 

• Transitions:  ALARMS  assumes  that 
all  hazards  will  eventually  be 
addressed  (their  alert  levels  will 
return  to  N as  a result  of  human  or 
autonomy  actions.  An  exception  is 
when  the  action  is  not  to  address  the 
hazard  at  any  stage  of  automation 

(e  g.  <0>),  in  which  case  the  state 
remains  the  same  for  that  hazard. 

• Durations:  ALARMS  models  action 
duration  distributions  by  assuming 
that  actions  at  differing  stages  of 


automation  take  different  durations. 
The  specific  durations  of  actions  will 
be  affected  by  pilot  state,  as  we  will 
specify  in  the  next  section. 

• Reward:  Reward  is  achieved  for 
addressing  the  hazard  and 
transitioning  back  to  a nominal  state. 
Each  hazard  can  have  a different 
reward  associated  with  it.  Reward 
will  also  depend  on  the  pilot  state,  as 
we  will  see  in  the  next  section. 

2.2.2  Pilot  State  in  ALARMS  TMDP 


As  shown  in  Figure  2,  Workload  affects  the 
duration  and  quality  of  pilot  actions  in  the 
AIJ^RMS  model.  This  is  accomplished  by 
performing  a two  step  process.  First,  a 
workload  score  is  computed  from 
measurements  of  factors.  This  is 
accomplished  through  a linear  weighting  of 
the  factors: 

Workload  = a*ME  ft*TD  • 

where  ME  represents  Mental  Effort,  TD 
represents  Task  Demands,  and  TP 
represents  Task  Performance,  a,  p,  and  y 
represent  linear  weights  that  allow  the 
prioritization  of  the  factors  to  be  varied. 

In  the  second  step,  the  workload  score  is 
used  to  modify  the  Duration  and  Reward 
function  of  the  ALARMS  TMDP.  We  use  the 
Workload  estimate  to  feed  information  into 
the  Integrated  User  Module  about  the 
expected  capabilities  of  the  pilot,  specifically 
the  expected  performance  quality  and  the 
expected  duration  of  pilot  actions.  The  effect 
of  Workload  varies  according  to  the  stage  of 
automation.  In  Stage  1 of  automation, 
increasing  workload  in  our  model  will  greatly 
decrease  quality  and  increase  duration  for 
high  workload  conditions  as  compared  to 
low  workload  conditions.  In  Stage  2, 
increasing  workload  will  decrease  quality 
and  increase  duration.  In  Stage  3,  we  make 
the  effects  negligible. 

The  specific  quantities  attached  to  these 
terms  "greatly  decrease”,  etc,  are 
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Figure  3:  ALARMS  System  Designer  Interface 


Figure  4;  High  Workload  Plan 


parameters  in  our  model.  At  present,  we  set 
quality  and  duration  to  halve  and  double, 
respectively,  in  Stage  1,  when  workload  is 
changed  from  Low  to  High.  Similarly  we  set 


quality  and  duration  to  decrease  and 
increase  25%  in  Stage  2,  and  to  decrease 
and  increase  5%  in  Stage  3.  Medium 
workload  is  currently  simulated  by 
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interpolating  between  the  high  and  low 
workload  conditions. 


3.0  DISCUSSION/EVALUATION 


In  order  to  evaluate  the  effect  of  the  factors 
on  workload  and  on  existing  plans,  an 
ALARMS  System  Designer  Interface  was 
developed.  The  Interface  is  shown  in  Figure 
3.  On  the  top,  multiple  system  alerts  can  be 
specified  at  various  levels  of  alert. 

Directives  are  the  highest  priority  of  alert, 
followed  by  Warnings,  Cautions,  and 
Advisories.  The  entries  “TCAS”  and  “Traffic 
information  display”  are  indicative  of  a loss 
of  separation  hazard,  and  the  entries 
“Landing  Gear”  and  “Flight  deck  display”  are 
indicative  of  a system  failure  hazard 
encountered  while  landing.  Thus,  we  see  in 
the  figure  that  there  is  a caution-level  alert 
for  loss  of  separation,  and  a lower-priority 
advisory  for  a system  failure  hazard.  Below 
the  hazards,  the  factors  affecting  pilot  state 
are  specified,  including  Mental  Effort,  Task 
Demands,  and  Task  Performance.  Each 
factor  is  given  a weight  (corresponding  to 
a,  p,  y),  in  this  case  the  weights  are  all  1.0. 
The  figure  shows  that  factors  are  all 
selected  as  “Low,”  and  thus  the  pilot  is 
under  low  workload  conditions. 

Below,  we  see  a time-dependent  plan  for 
addressing  the  hazards,  as  computed  by 
ALARMS.  The  x-axis  is  in  time  units. 

Without  loss  of  generality,  it  is  assumed  that 
the  tasks  have  a “deadline”  at  the  20  unit 
mark  on  the  x-axis,  and  the  plan  works 
backwards  from  that  mark.  The  y-axis 
shows  the  utility  of  the  plan  (on  a relative 
scale  to  the  ALARMS  planning  problem).  As 
expected,  utility  decreases  as  the  deadline 
approaches.  The  figure  shows  that 
ALARMS  produces  a 3-part  plan  for 
addressing  the  hazards  under  these 
conditions.  Each  part  of  the  plan  consists  of 
the  letter  “L”  followed  by  a stage  of 
automation  for  each  hazard,  thus  “L11” 
denotes  that  both  hazards  are  handled  at  a 
stage  of  automation  of  1,  “L12”  indicates 


that  the  loss  of  separation  hazard  is  handled 
at  stage  1 and  the  system  failure  hazard  is 
handled  at  stage  2.  The  figure  shows  that 
when  there  are  more  than  8 time  units 
remaining,  the  hazards  are  both  handled  at 
Stage  1 of  automation.  As  the  deadline 
approaches,  the  recommended  stage  of 
automation  transitions  to  “L12,”  that  is,  the 
lower  priority  hazard  is  handled  at  a higher 
stage  of  automation.  Within  1.5  time  units  of 
the  deadline,  the  stage  of  automation 
transitions  to  “LI  3”  the  lower  priority  hazard 
is  handled  at  Stage  3 of  automation. 

Figure  4 shows  a second  plot.  Here,  the 
Phase  of  Flight  has  been  changed  to  Land, 
Mental  Effort  and  Task  performance  are 
labeled  as  indicating  “High”  workload 
conditions,  and  Task  Demands  are 
’’Medium.”  This  is  a higher  workload 
condition  than  the  first  example.  As  a result, 
the  ALARMS  planner  is  informed  that  low 
stages  of  automation  will  be  less  effective. 
The  resulting  plan  at  the  bottom  of  the  figure 
shows  that  higher  stages  of  automation  are 
selected  at  earlier  points  in  time. 

4.0  CONCLUSION 

In  this  paper,  we  introduced  a model 
designed  to  predict  pilot  performance  in  the 
cockpit,  proposed  to  be  implemented  as  a 
component  of  NextGen  alerting  systems. 
The  larger  ALARMS  system  design  consists 
of  Bayesian  reasoning  to  determine  type 
and  priority  of  existing  hazards,  a Time 
Dependent  Markov  Decision  Process-based 
planner  to  address  the  hazards  in  a timely 
fashion,  a human  performance  estimator  to 
inform  the  planner  as  to  the  state  and 
capabilities  of  the  pilot,  and  an  interface  to 
inform  the  pilot  of  alerts  in  the  best  possible 
manner.  In  this  paper  we  focused  on  a 
model  to  contribute  to  the  human 
performance  estimator. 

Key  components  of  the  model  are  that  it 
estimates  workload,  it  predicts  the  duration 
and  quality  of  pilot  performance,  and  it  can 
be  used  to  recommend  what  information  will 
be  displayed  to  the  pilot,  and  what 
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information  processing  stage  will  be 
supported. 

Future  work  consists  of  several  directions. 
First,  we  will  focus  on  the  real-time  nature  of 
the  measures,  and  how  such 
measurements  can  be  integrated  into  the 
cockpit  in  an  unobtrusive  manner.  Second, 
we  will  embellish  the  model  further.  For 
example,  the  literature  on  task  switching  as 
well  as  issues  related  to  attention  (Yerkes  & 
Dodson,  1908)  can  be  added  to  the  model. 
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NextGen  Aircraft  Alerting  Systems 


■ Future  flight  deck  systems  will 
require  sophisticated 

information  management 
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notification  (IAN)  function  to: 

- Continuously  monitor  info  from  various  sources  to  evaluate  hazart  potential 
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Levels  of  Automation* 


EEtns; 


10- Computer  Decides  everything,  acts  autonomously 
9 - Informs  human  if  it  decides  to 
8 - Informs  human  only  if  asked 
7 - Executes  then  informs 

4 - Suggests  one  alternative 
3 - Narrows  selection  to  a few 
2 - Offers  complete  set  of  alternatives 
1 - No  computer  assistance 


*Wickens  1998,  based  on  Sheridan  and  Verplank  1978 
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Example  4 stage  model 

- Sensory  Processing 

- PerceptionAA/orking  Memory 

- Decision  Making 

- Response  Selection 


Four  classes  of  functions 
- Information  acquisition 


- Information  analysis 

- Decision  and  action  selection 

- Action  implementation 
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Integrated  User  Model 
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Mock-ups  designed  by  Andy  Chang  M.S.  and  Dr.  Amy  Alexander 
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Mock-ups  designed  by  Andy  Chang  M.S.  and  Dr.  Amy  Alexander 
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Mock-ups  designed  by  Andy  Chang  M.S.  and  Dr.  Amy  Alexander 
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Pilot  State  Refinement 
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Workload 

- Performance  Response  Function  (PRF) 

■ Finite  amount  of  resource  able  to  be  allocated  to  problem 

■ Timesharing 

■ Automaticity 

- Satisfactory  workload  Index  has  these  properties: 

■ Diagnoses  cause  of  variation 

■ Selective  of  factors 

■ Unobtrusive 

■ Reliable 


©2010,  Aptima.Inc. 


12 


665 


■ Bedford  Workload  Scale 

- Physiological  Information  (EEG,  HRV) 

■ Task  Demands 

- Task  Complexity 

- Number  of  Tasks 

■ Ongoing  Task  Performance 
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Mental  Effort,  Task  Demands,  and  Ongoing  Task 
Performance  are  used  to  compute  Workload 

- We  model  these  as  Low/Medium/High,  take  linear  combination 

- Workload  = a*ME  + p*TD  + y*TP 

- Workload  is  combined  with  information  about  hazard  state  and 
stage  of  automation  to  determine  the  quality  and  duration  of 
action. 

Increasing  Pilot  Workload  leads  to: 

- Greatly  decreasing  quality,  increasing  duration  in  Stage  1 

- Decreasing  quality,  increasing  duration  in  Stage  2 

- Very  small  effects  in  Stage  3 

Thus,  increasing  Pilot  Workload  will  tend  to  increase  the 
stage  of  automation. 


©2010,  Aptima,lnc. 
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Integrated  User  Model 


ifSfAPTIMA 

' 'y,  I-  HUMAN^NTERED 
! ‘/I  ENGINEERING 


SSI 
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Identify  CWA  (Caution/Warning/Alerting)  r APTIMA 

App  ications  For  NextGen  and  Deve  op  a System  engineering 


■ Design  Approach  - 

- Define  Alerting  Levels 


Alert 

Timeframe 

Directive 

<10  seconds 
10--15  seconds 

Caution 

< 40  seconds 

Advisory 

non-critical 

23  Existing  Systems  Reviewed 
13  NextGen  Planned  Systems 
Reviewed 


- Define  Hazard  List 

■ System  Failure 

■ System  Performance  Compromised 

■ Loss  of  Separation 

■ Adverse  Weather  Encounter 

■ Altitude  Deviation 

■ Navigation  Deviation 

0 2009,iiiptima,lnc. 


■ Controlled  Flight  Into  Terrain 

■ Crew  Incapacitation 

■ Flight  Performance  Compromised 

■ Structural  Failure 

■ Life  Support 

■ Protected  Airspace  Incursion 

■ Loss  of  Communication 

■ Runway  Incursion 

16 
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Hazard  Matrix 


ifffAPTIMA 

' 'y,  X HUMAN^NTERED 
ENGINEERING 


Tools/Technology/Systems 


Sub-Systems 


System  Adverse 

System  Performance  Loss  of  Weather  Altitude 

Failure  Compromised  Separation  Encounter  Deviatioi 


Electrical  (3.1.11) 

Hydraulic  (3.1.1S) 

'Fuel  (3.1.14) 

(C) 

Landing  Gear  (3.1.17) 

Corvlitioning/Pressurizaten  (3.1.21/22) 

M 

Ice  Protection  (3.1.16) 

■ m 1 

Fire  Protection  (3.1.12) 



[ n 1 

Enhanced  Ground  Proximity  Warning  (EGPWS)  (3.1.20) 

(A) 

(A) 

1 

r 

HMMMilHBinndH 

Enroute  (3.1.1) 

(A) 

(A) 

1 



(A) 

(A) 

(A) 

EMUllMtlBHDMHinin  ] 

1 1 

[— 1 

iiifiii.18) 

[ (A) 

(A) 

A=Advisory,  C=Caution,  W=Warning 
Blue=Aviation,  Green=Navigation,  Purple=Communication 
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ALARMS  Bayesian  Network 

X APTIMA 

: 1 HUMAN-CENTERED 

IjJ fA:  ENGINEERING 



-m 

Subsystem 

Alert 


RazafT 
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Integrated  User  Model 


©2010,  Aptima,lnc. 


TMDP  model  (Boyan  and  Littman  2000) 

- <S,A,P,dR> 

- Set  of  States;  Mapping  from  hazards  to  alert  levels 

■ Example:  <N,A,W> 

- Set  of  Actions:  Stageof  Automation  for  each  hazard 

■ Stage  1 = human-centric,  Stage  3 = highly  automated 

■ Example<l,3,2> 

- Probabilistic  Transitions:  P(s,a,s’) 

- Action  Durations:  Probability  Density  Function  d(s,a,s’,t) 

■ Deadline  A after  which  there  is  no  reward 

■ More  automated  actions:  Quicker 

- Reward  for  addressing  hazard  and  returning  to  nominal  state 

■ More  automated  actions:  Lower  reward 

- Policy;  Mapping  from  state  and  time  to  action 


©2010,  Aptima,lnc. 
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■ Mental  Effort,  Task  Demands,  and  Ongoing  Task 
Performance  are  used  to  compute  Workload 

- We  model  these  as  Low/Medium/High,  take  linear  combination 

- Workload  = a*ME  + p*TD  + 7*TP 

- Workload  is  combined  with  information  about  hazard  state  and 
stage  of  automation  to  determine  the  quality  and  duration  of 
action. 

■ Increasing  Pilot  Workload  leads  to: 

- Greatly  decreasing  quality,  increasing  duration  in  Stage  1 

- Decreasing  quality,  increasing  duration  in  Stage  2 

- Very  small  effects  in  Stage  3 

■ Thus,  increasing  Pilot  Workload  will  tend  to  increase  the 
stage  of  automation. 


©2010,  Aptima,lnc. 


Define  values  V(s)(t) 

Expected  Reward  (for  Bellman  Backup) 


arg  max  ( P(s'|s,o)  f pg  a{t'){R{s,a,  s')  + V*(s'){t  — t'))dt' 

Jo  ’ ^ 


Since  continuous,  need  approximate  value  functions 

- Convolution  becomes  intractable 

- Solution:  phase-type  distributions 

- Convert  to  MDP  with  uniform  action  durations 


f p(t’)V(s’)(t-t’)  dt' 

Jo 
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ALARMS  System  Designer  Interface 

0 Apfrm.i  Ai  AR“S 

1 ALARMS  Designer’s  Interface o 4 


: HUMAN^NTERED 

ENGINEERING 


A*  A-  n*iP  «><wi 


*lnterface  GUI  programmed  in  FLEX  by  Gilbert  Mizrahi  at  Aptima 

©2010,  Aptima, Inc. 


‘Interface  GUI  programmed  in  FLEX  by  Gilbert  Mizrahi  at  Aptima 
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ALARMS  System  Designer  Interface 


i^^APTIMA 

'p  HUMAN^NTERED 
'Jd  ENGINEERING 


9 Aptlnu  ALARMS 

ALARMS  Designer's  Interfaceo 


A+  A-  Help  ttKMi 


Model  Configurator 


Weather  Radar 

© None 

0 Advisory 

© Caution 

Rewards  0 

too  i 

Ice  Protection 

© None 

G Advisory 

0 Caution 

Rewards  ® 

too  1 

-■  m 

VNAV 

0 None 

© Advisory 

0 Caution 

0 Warning 

0 Directive 

Rewards  0 

_10  3 1 

CDTI 

0 None 

© Advisory 

0 Caution 

0 Warning 

0 Directive 

Rewards  ° 

^ I 

1 Pilot  Slates  JM 

Phase  of  Flight 

© EnRoute 

0 Approach  G Land 

Saw  Dele  and  Run  8101  | 


State  Selection 


Select  Run  FHe  C:\Documents  and  Setiings\ac  | Bwwe  | [ LoadRwfiie  ] Select  a Hazard  State  [ CBWwte.cawtiwiAiMaatyiia  [ » | Pilot  Slate:  EnRoule 


I' 

I 

% 


*lnterface  GUI  programmed  in  FLEX  by  Gilbert  Mizrahi  atAptima 
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ALARMS  System  Designer  Interface 


d ALARMS 

{ ALARMS  Designer's  Interfaceo 


Model  Configurator 


f Vv  APTIMA 

i-r,  --  ■ i HUMAN^NTERED 
jljt  Ji:  ENGINEERING 


U5[|jl33 


Phase  of  Flight  © EnRoute  © Appro...  FiiTSdl 

1 ■ 


15 


Weather  Radar 

0 None 

O Advisory  © Caution 

Rewards  0 

too 

1 Ice  Protection 

© None 

© Advisory  O Caution 

Rewards  ® 

I 

VNAV 

O None 

© Advisory  O Caution 

O Warning 

O Directive 

Rewards  0 

too 

CDTI 

© None 

® Advisory  0 Caution 

© Warning 

O Directive 

Rewards  0^ ^ 

state  Selection 


Select  Run  File  C:\Oocuments  and  Settings^  [ er<ww>  | | Select  a Hazard  State  | t<n«.caMii«niAtM*otyj»a  I * | Pilot  State:  Land 


SIMr.  CMMlAiMtwy 


0 1 i 3 4 3 6 7 8 9 10  11  12  13  14  ts  16  17  te  19  20 

Tune 


‘Interface  GUI  programmed  in  FLEX  by  Gilbert  Mizrahi,  M.S.,  atAptima 

©2010,  Aptima.Inc. 
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" Contributions  of  ALARMS  |||M|PIP 

- Introduced  system  architecture 

- Developed  interface  mockups 

- Identified  current  and  future  hazards 

- Bayesian  reasoning  over  uncertain  hazard  state  and  sensor 
systems 

- TMDP  planning  of  interface  stages  of  automation 

■ TMDP  planning  reasons  about  time  duration  uncertainty 

■ TMDP  model  is  adaptableto  empirical  findings 

- Accounts  for  pilot  state 

- Developed  system  designer  interface  for  understanding  system 
behavior 

■ Future  work 

- System  integration  and  empirical  evaluation 
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5.10  Imbalanced  Learning  for  Functional  State  Assessment 


Imbalanced  Learning  for  Functional  State  Assessment 

Feng  Li,  Frederick  McKenzie  and  Jiang  Li 

Old  Dominion  University 

fiixxOOS^.odu.edu  rcJmckenz^.odu.edu  ili^.odu.eclu 

Guangfan  Zhang  and  Roger  Xu 
Intelligent  Automation,  Inc. 
ozhana(3>i-a-i.com  haxu{d>i-B-icom 


Carl  Richey  and  Tom  Schnell 
University  of  Iowa 
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Abstract.  This  paper  presents  results  of  several  imbalanced  learning  techniques  applied  to  operator  functional  state  assessment 
where  the  data  is  highly  imbalanced,  i.e.,  some  function  states  (majority  classes)  have  much  more  training  samples  than  other 
states  (minority  classes).  Conventional  machine  learning  techniques  usually  tend  to  classify  all  data  samples  into  majority  classes 
and  perform  poorly  for  minority  classes.  In  this  study,  we  implemented  five  imbalanced  learning  techniques,  includirig  random  under- 
sampling, random  over-sampling,  synthetic  minority  over-sampling  technique  (SMOTE),  borderline-SMOTE  and  adaptive  synthetic 
sampling  (ADASYN)  to  solve  this  problem.  Experimental  results  on  a benchmark  driving  test  dataset  show  that  accuracies  for 
minority  classes  could  be  improved  dramatically  with  a cost  of  slight  performance  degradations  for  majority  classes. 


1.0  INTRODUCTION 

An  Operator  Functional  State  (OFS)  refers 
to  a multidimensional  pattern  of  the  human 
psychophysiological  condition  that  mediates 
performance  in  relation  to  physiological  and 
psychological  costs  [1],  Accurate  OFS 
assessment  for  human  operators  plays 
critical  roles  in  automated  aviation  systems 
because  it  can  ensure  mission  success  and 
improve  mission  performances  [2]. 

Researchers  proposed  various  modeling 
tools  to  assess  OFS.  In  Ref.  [3],  a stepwise 
discriminate  analysis  (SWDA)  method  and 
artificial  neural  networks  (ANN)  were 
proposed  to  perform  OFS  assessment.  As  a 
nonlinear  model,  the  ANN  is  considered 
more  advantageous  in  complex  task 
situations,  especially  if  multiple  features  are 
used.  In  Ref.  [2],  committee  machines 
proved  useful  in  improving  the  assessment 
accuracy.  Errors  of  individual  committee 
members  can  be  canceled  if  the  errors  are 
independent.  Therefore,  improvement  can 
be  achieved  if  individual  members  have  low 
biases  and  are  less  correlation  i.e.,  they  are 
diversified  [4],  In  addition  to  the  traditional 
“bagging”  technique,  which  generates 
multiple  versions  of  prediction  based  on  the 
bootstrap  technique  to  produce  the  final 
prediction  [5],  performing  a feature  selection 


procedure  before  training  can  further  reduce 
correlations  among  committee  members  [2]. 

To  successfully  perform  OFS  assessment, 
however,  researchers  often  face  the 
challenge  of  modeling  imbalanced  datasets 
where  datasets  are  not  balanced,  i.e.,  some 
OFS  states  have  much  more  data  samples 
than  others  do.  In  the  machine  learning 
community,  those  OFSs  having  more  data 
samples  than  others  are  named  'majority' 
classes  while  those  having  less  samples  are 
called  'minority'  classes.  Traditional 
classifiers  tend  to  classify  all  data  samples 
into  majority  classes,  resulting  in  poor 
performances  for  minority  classes  [6],  which 
is  not  acceptable  for  OFS  assessment. 

Many  imbalanced  learning  techniques  have 
been  proposed  to  balance  performances 
among  majority  and  minority  classes.  Those 
techniques  could  be  divided  into  four 
categories  [6];  sampling  methods,  cost- 
sensitive  methods,  kernel-based  methods, 
and  active  learning  methods.  Sampling 
methods  aim  to  reduce  the  imbalance  by 
removing  (under-sampling)  samples  from 
majority  classes  or  generating  (over- 
sampling)  more  training  samples  for 
minority  classes  [7].  Cost-sensitive  methods 
improve  classification  performance  by  using 
different  cost  matrices  to  compensate  for 
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imbalanced  classes  [8].  Kernel  based 
methods,  such  as  the  support  vector 
machine  (SVM),  are  based  on  the  principles 
of  statistical  learning  and  Vapnik- 
Chervonenkis  (VC)  dimensions  [9],  Active 
learning  is  a type  of  iterative  supervised 
learning  technique,  which  is  used  in 
situations  where  unlabeled  data  is 
abundant.  Active  learning  is  often  integrated 
into  kernel-based  learning  methods  by 
selecting  the  closest  instance  to  the  current 
hyper  plane  from  the  unseen  training  data 
and  adding  it  to  the  training  set  to  retrain  the 
model  [10], 

We  have  developed  an  OFS  assessment 
strategy  based  on  a committee  machine  for 
a closed-loop  adaptive  task  manage 
system,  where  the  OFS  assessment  was 
treated  as  a regression  problem  [2],  In  this 
paper,  we  redesigned  a similar  model  for 
the  same  task;  however,  we  treated  the 
OFS  assessment  as  a classification 
problem.  Because  the  data  sets  are  highly 
imbalanced,  traditional  classifiers  failed  to 
classify  minority  states.  We  implemented 
several  imbalanced  techniques  to  improve 
classification  performances  for  those 
minority  OFS  states. 

The  remainder  of  the  paper  is  organized  as 
follows:  Section  2 describes  several 

imbalanced  learning  techniques 
implemented  in  this  paper.  Section  3 
presents  the  architecture  of  a committee 
classifier.  Section  4 illustrates  our 
experimental  design,  including 
implementation  of  the  imbalanced  learning 
techniques  and  the  design  of  a committee 
classifier.  Section  5 shows  our  achieved 
experimental  results.  Section  6 provides 
discussions  for  the  results  and  Section  7 
concludes  the  paper. 

2.0  IMBALANCED  LEARNING 
TECHNIQUES 

There  exist  many  imbalanced  learning 
techniques  in  the  literature  as  described  in 
the  excellent  review  paper  [6].  In  our  study, 
we  implemented  five  of  them  as  described 
below. 


• Random  under-sampling 

• Random  over-sampling 

• Synthetic  minority  over-sampling 
technique  (SMOTE) 

• Borderline-SMOTE 

• Adaptive  synthetic  sampling  (ADASYN) 

All  the  methods  have  been  detailed  in  the 
Ref.  [6],  including  their  implementations, 
performances  and  limitations.  The  overall 
goal  of  those  methods  is  to  make  data 
samples  balanced  among  classes  by 
dropping  some  data  samples  from  majority 
classes  and  adding  samples  to  minority 
classes,  and  to  keep  roughly  the  equal 
number  of  data  samples  for  all  classes. 

2.1  Random  under-sampling 

Random  under-sampling  was  only  applied 
to  majority  classes.  The  method  randomly 
selects  a number  of  majority  data  samples 
to  keep.  This  method  may  loss  information 
in  the  majority  classes. 

2.2  Random  over-sampling 

The  random  over-sampling  method  was 
only  utilized  to  minority  classes.  In  contrary 
to  the  random  under-sampling  technique, 
this  method  randomly  selects  data  samples 
from  minority  classes  and  duplicates  them 
till  the  data  set  is  roughly  balanced.  This 
method  may  lead  to  overfitting  because 
data  samples  are  repeatedly  used. 

2.3  SMOTE 

To  overcome  the  overfitting  defect  of  the 
random  over-sampling  method,  SMOTE 
generates  or  synthesizes  new  samples  for 
minority  classes.  To  create  a new  synthetic 
sample  for  a given  data  point  (seed)  from 
minority  classes,  it  first  randomly  selects 
one  of  its  K-nearest  minority  neighbors  (K  is 
specified  by  researchers  arbitrarily).  Then,  a 
random  point  that  is  on  the  line  between  the 
seed  and  the  selected  neighbor  will  be 
synthesized  as  a new  data  sample.  SMOTE 
may  lead  to  the  problem  of  over 
generalization  [12].  The  following  methods, 
Borderline-SMOTE  and  ADASYN,  are 
developed  to  overcome  this  limitation. 
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2.4  Borderline-SMOTE 

Borderline-SMOTE  and  SMOTE  differ  in  the 
ways  they  select  seeds.  SMOTE  may  select 
any  minority  sample  as  a seed  while 
Borderline-SMOTE  only  considers  those 
who  are  from  minority  classes  and  are  on 
the  borderline  between  minority  and 

majority  classes.  A minority  class  sample  Is 
considered  as  on  the  borderline  if  majority 
of  its  M nearest  samples  belong  to  majority 
classes  (M  is  specified  by  researchers 
arbitrarily). 

2.5  ADASYN 

The  difference  between  ADASYN  and 

SMOTE  is  the  amount  of  new  data  samples 
to  be  synthesized  for  each  seed.  SMOTE 
generates  the  same  number  of  data 

samples  for  each  seed  while  ADASYN 
syntheses  data  samples  according  to  the 
distribution  of  seeds.  Considering  K nearest 
neighbors  of  a seed,  the  more  belonging  to 
majority  classes,  the  more  new  samples  will 
be  synthesized  for  the  seed. 

3.0  COMMITTEE  MACHINE 

A committee  machine  is  an  ensemble  of 
multiple  estimators  {committee  members), 
which  could  be  any  learning  method  for 
classification  or  regression.  The  output  of  a 
committee  machine  is  fusion  of  the  outputs 
from  all  of  its  members.  A theoretic 
interpretation  for  the  principle  of  committee 
machine  is  that  the  errors  from  individual 
committee  members  can  be  canceled  to 
some  extent  if  they  are  uncorrelated. 

Research  results  show  that  the  performance 
improvement  can  be  affected  by  two  factors: 
accuracies  of  individual  committee 
members  and  correlations  among  them  [4]. 
For  the  first  factor,  selection  of  an 
appropriate  individual  mode!  is  essential, 
because  a better  performance  will  usually 
be  achieved  if  each  of  the  individual 
members  performs  well.  For  the  second 
factor,  several  techniques  like  bagging, 
boosting,  averaging  or  voting,  mixture  of 
experts  have  proved  effective  [4].  In  this 
paper,  we  use  the  following  techniques  to 
build  the  committee  machine. 


• Use  the  bootstrapping  technique  to 
generate  multiple  'copies’  of  the  training 
data. 

• Apply  an  advanced  feature  selection 
algorithm.  Piecewise  Linear  Orthogonal 
Floating  Search  (PLOFS)  [11],  to 
diversify  the  committee  members  such 
that  their  performances  are  not  highly 
correlated. 

• Train  a Multi-Layer  Perceptron  (MLP)  by 
the  standard  Back  Propagation  (BP) 
algorithm  as  a base  classification  model 

• Delete  the  committee  members  having 
high  biases  (accuracy  < 50%). 

• Utilize  the  majority  vote  scheme  to  fuse 
decisions  from  committee  members.  For 
example,  if  majority  of  the  15  total 
committee  members  predict  class  1,  the 
final  output  of  the  committee  is  class  1. 


The  system  diagram  of  the  committee 
machine  is  shown  in  Figure.  1. 


Figure  1:  Diagram  of  the  Committee  Machine 


4.0  EXPERIMENT  DESIGN 

4.1  The  driving  test  dataset 

We  utilized  a driving  test  dataset  to  validate 
our  proposed  method  for  OFS  assessment. 
The  dataset  was  collected  by  participants 
performing  a driving  test  over  the  course  of 
two  hours.  The  collected  information 
includes  description  of  the  driving  task, 
system  dynamics  related  information, 
performance  measures,  physiological 
signals  (128-channel  EEG,  ECG, 
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respiration,  etc.),  and  eye  tracking.  The 
workload  was  also  analyzed  according  to 
the  driving  conditions  (city-driving,  stopped, 
highway  passing,  etc.),  and  seven  OFSs, 
which  indicate  seven  workload  levels,  were 
defined. 

Six  subjects  participated  in  the  driving  test 
and  data  was  recorded  in  a separate  file  for 
each  participant,  resulting  in  six  individual 
datasets.  Each  dataset  has  seven  operator 
functional  states  (workload)  that  are 
considered  as  seven  classes  by  our 
committee  classifier.  In  the  dataset,  the 
number  of  data  samples  in  each  class  is  not 
balanced.  Four  classes  (minority  class) 
have  much  less  data  samples  than  other 
three  majority  classes  do.  Table  1 and 
Figure  2 show  data  distributions  for  all 
classes. 

Data  distributions  are  similar  for  all  subjects. 
Class  2 has  the  largest  number  of  samples 
(about  35%  of  the  whole  data).  Class  3 and 
4 have  the  second  largest  number  of 
samples  (about  20%).  Therefore,  around 
75%  of  samples  belong  to  those  three 
classes.  Class  7 has  the  smallest  number  of 
samples  accounting  for  less  than  1%  of  the 
whole  data,  and  subjects  2,  4 and  6 even 
have  no  data  for  class  7.  Class  6 is  the 
second  smallest  class  having  about  3%  of 
the  whole  data  samples.  Both  class  1 and  5 
account  for  5%  of  the  data  samples. 

4.2  Imbalanced  learning  techniques 

To  implement  the  five  imbalanced 
techniques,  we  first  compute  a desired 
percentage  of  data  samples  per  class  as, 

Nd  =100/  no.  of  classes  * 1 00% 

= 100/7  *100%  = 14.29% 

We  then  calculate  a high  threshold  (Th)  and 
a low  threshold  (TJ  for  the  number  of  data 
samples  in  each  class  as, 

Th  =/Vd*  (1+0.1) 

= 14.29%*  If  = 15.71% 

Tl  =/Vd*  (1-0.1) 

= 14.29%  *0.9=  12.86% 


Class 

Data 
set  1 
(«i) 

Data 
set  2 
("») 

Data 
set  3 
("o) 

Data 
set  4 
(%) 

Data 
set  5 
(%) 

Data 
set  6 
(“'•) 

1 

6.17 

8.70 

6.29 

5.59 

3.52 

3.86 

2 

38.34 

39.24 

33.83 

39.66 

32.65 

39.87 

3 

19.% 

21.42 

24.56 

32.94 

26.39 

20.16 

4 

23.55 

19.40 

21.07 

16.43 

31.24 

27.05 

5 

8.03 

8.25 

11.30 

3.03 

2.99 

6.10 

6 

3.89 

2.98 

2.67 

2.35 

2.99 

2.% 

7 

0.06 

0.00 

0.28 

0.00 

0.22 

0.00 

Table  1 : Data  Distribution  among  Classes 
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Figure  2:  Data  Distribution  among  Classes 


Classes  having  data  samples  more  than  Th 
are  considered  as  majority  classes  while 
classes  with  data  samples  less  than  Tl  are 
considered  as  minority  classes  and  others 
are  treated  as  medium  classes. 

As  such,  there  are  seven  classes  and  A/^,  Tl 
and  Th  are  14.29%,  12.86%  and  15.71%, 
respectively.  Referring  to  Table  1,  it  is  clear 
that  classes  2,  3 and  4 are  majority  classes. 
Class  1,  5,  6 and  7 are  minority  classes  and 
there  is  no  medium  class  in  our  datasets.  In 
order  to  achieve  a balanced  dataset,  the 
data  portions  in  both  majority  and  minority 
classes  are  made  roughly  the  same  as  Nd. 
We  apply  the  random  under-sampling 
technique  to  the  majority  classes  and  four 
over-sampling  methods  to  the  minority 
classes,  resulting  in  four  balanced  datasets 
as  shown  in  Figure  3.  For  each  participant, 
the  balanced  dataset  shares  the  majority 
classes’  data  samples  but  has  different  data 
samples  from  minority  classes,  depending 
upon  which  oversampling  method  is  used. 
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Figure  3:  Generation  of  Balanced  Datasets 

4.3  Committee  classifier 

The  committee  classifier  consists  of  a 
bootstrap  procedure,  a feature  selection 
process  and  a majority  voting  scheme  (see 
Figure.  4).  A MLP  trained  by  the  BP 
algorithm  was  implemented  as  the  base 
classification  model.  Basic  procedures 
performed  by  the  committee  classifier  are 
as  follows: 

1 . Randomly  divide  a subject’s  dataset  into 
two  parts  with  equal  number  of  data 
points,  one  for  training  and  another  for 
testing. 

2.  Generate  M bootstrapped  datasets  for 
the  training  dataset. 

3.  Apply  one  of  the  imbalanced  learning 
techniques  to  the  bootstrapped 
datasets.  A balanced  dataset  is  then 
obtained  for  each  of  the  M datasets. 

4.  Select  a set  of  most  effective  features 
for  each  of  the  balanced  datasets  using 
the  PLOFS  algorithm.  Selected  features 
for  different  datasets  maybe  different. 

5.  Train  a MLP  classifier  for  each  of  the 
datasets  using  the  features  selected  for 
that  dataset. 

6.  Apply  the  trained  MLP  to  the  training 
and  testing  datasets. 

7.  Generate  the  final  classification  result  by 
majority  voting.  MLPs  having  training 


accuracies  greater  than  50%  are  used 
only.  Repeat  the  above  procedures  by 
exchanging  the  role  of  training  and 
testing  datasets. 


8.  Repeat  the  above  steps  for  each  of  the 
imbalanced  learning  techniques 
described  in  Section  3. 


Figure  4:  Design  of  the  Committee  Classifier 
5.0  RESULTS 

We  trained  a committee  classifier  for  each 
of  the  six  participants  (datasets)  and  results 
are  shown  in  Tables  2 - 7 and  Figs.  5-10. 

In  the  Tables,  the  ‘Untreated’  column 
illustrates  results  achieved  on  the  original 
data  sets.  Other  four  columns  present 
accuracies  (in  percentage)  for  each  class 
achieved  by  applying  the  four  imbalanced 
learning  techniques  to  the  minority  classes. 
The  last  row  shows  the  average  (overall) 
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accuracies  achieved  by  each  of  the 
techniques. 

6.0  DISCUSSION 

It  is  observed  that  the  classification 
accuracies  are  highly  imbalanced  if  no 
imbalanced  learning  technique  is  used.  For 
instances,  the  minority  class  7 always  has 
0%  accuracy  for  all  subjects  but  good 
performances  are  usually  achieved  for 
majority  classes  2,  3 and  4.  Classification 
accuracies  have  been  balanced  among 
minority  and  majority  classes  by  applying 
the  four  imbalanced  learning  techniques  to 
minority  classes.  Accuracies  often  have 
been  significantly  improved  for  minority 
classes  while  those  for  majority  classes 
have  been  decreased  slightly.  As  a result, 
the  overall  performance  has  been  slightly 
degraded.  Note  that  different  sampling 
algorithms  appear  to  perform  similarly, 
indicating  the  robustness  of  the  imbalanced 
learning  techniques. 
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Figure  6:  Results  for  Dataset  2 

Table  2:  Results  for  Dataset  1 


Class 

lin 

treated 

Over 

sample 

Smote 

Ikirder 

.Ada 

Syn 

1 

94.21% 

97.40®o 

98.44®b 

96.35®b 

97.40®o 

2 

99.15% 

98.58®o 

99.08®o 

98.49®o 

98.99®o 

3 

90.82®b 

85.67®b 

74.24^0 

11.1%% 

82.13% 

4 

70.53®b 

57.30% 

42.84®o 

66.85®b 

59.07®o 

5 

2.80®o 

38.40% 

71.20®o 

15.60% 

26.80®o 

6 

57.02®b 

89.26% 

82.64®b 

66.12®b 

73.55®  b 

7 

0% 

100?b 

lOO^b 

100®o 

100% 

Over 

all 

81.23®b 

81.01®/b 

11.96% 

78.86®b 

79.34®b 

S 


■ Untreated 

■ OverSample 

■ Smote 

■ Border 

■ AdaSyn 


Figure  5:  Results  for  Dataset  1 


Table  4:  Results  for  Dataset  3 
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Figure?:  Results  for  Dataset  3 
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Table  4:  Results  for  Dataset  4 
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Table  6:  Results  for  Dataset  6 
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Figure  7:  Results  for  Dataset  4 


Figure  9:  Results  for  Dataset  6 


Table  5:  Results  for  Dataset  5 
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Figure  8;  Results  for  Dataset  5 


7.0  CONCLUSIONS 

We  have  implemented  five  different 
imbalanced  techniques  for  OFS  assessment 
and  validated  our  methods  on  driving  test 
benchmark  datasets.  Experimental  results 
consistently  show  that  classification 
accuracies  for  minority  classes  in  the  tested 
datasets  are  improved  dramatically  with  a 
cost  of  slight  performance  degradations  for 
majority  classes,  indicating  that  imbalanced 
learning  techniques  could  be  very  useful  for 
OFS  assessment. 

In  a practical  setting,  an  OFS  assessment 
model  will  be  trained  offline.  We  can  utilize 
the  imbalanced  learning  techniques  to 
improve  recognition  accuracies  of  the 
assessment  model  for  minority  OFSs 
without  severely  decreasing  assessment 
effectiveness  for  majority  OFSs.  Once  the 
model  is  trained,  it  will  then  be  able  to 
recognize  all  possible  OFSs  relatively 
accurately  on  the  fly.  This  Is  critical  because 
some  minority  OFSs  may  be  highly 
correlated  to  aviation  safety. 
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Our  future  work  includes  further  testing  the 
applicability  of  more  imbalanced  learning 
techniques  to  the  OFS  assessment  task, 
validating  those  methods  on  more  subjects’ 
datasets  and  integrating  the  most  effective 
scheme  into  a real  time  OFS  assessment 
system. 
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Result  for  data  set  4 
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Result  for  data  set  6 
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Conclusions 


r^MODSIM  WORLD 
y ^ ^ Conlerence  & Expo 

• By  using  imbalanced  learning  techniques, 
classification  accuracies  for  minority  OFSs  are 
improved  dramatically  with  a cost  of  slight 
performance  degradations  for  majority  OFSs 

• Different  sampling  algorithms  appear  to  perform 
similarly 

• Future  work 

- Test  more  imbalanced  techniques 

- Validate  those  techniques  on  more  subjects’  datasets 
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Abstract  Human  performance  modelers  at  the  US  Army  Research  Laboratory  have  developed  an  approach  for 
establishing  Soldier  high  workload  that  can  be  used  for  analyses  of  proposed  system  designs.  Their  technique  indudes 
three  key  components.  To  implement  the  approach  in  an  experiment,  the  researcher  would  create  two  experimental 
conditions:  a baseline  and  a design  alternative.  Next  they  vi/ouid  identify  a scenario  in  which  the  test  participants  perform 
all  their  representative  concurrent  interactions  with  the  system  This  scenario  should  include  any  events  that  would  trigger 
a different  set  of  goals  for  the  human  operators  They  would  collect  workload  values  during  both  the  control  and 
alternative  design  condition  to  see  if  the  alternative  increased  workload  and  decreased  performance.  They  have 
successfully  implemented  this  approach  for  military  vehicle  designs  using  the  human  performance  modeling  tool, 
IMPRINT  Although  ARL  researches  use  IMPRINT  to  implement  their  approach,  it  can  be  applied  to  any  workload 
analysis.  Researchers  using  other  modeling  and  simulations  tools  or  conducting  experiments  or  field  tests  can  use  the 
same  approach. 


1.0  INTRODUCTION 

As  system  engineers  begin  to  design  a system,  it  is 
critical  for  them  to  understand  how  the  human 
operators  will  interact  with  the  system.  This 
understanding  is  critical  because  they  are 
designing  the  system  so  the  human  operators  can 
accomplish  specific  goals.  The  humans'  ability  to 
accomplish  these  goals,  therefore,  determines  the 
effectiveness  of  the  system  design.  A key 
component  of  the  human  operators’  abilities  to  use 
the  system,  in  turn,  is  their  mental  workload  level. 
Mental  workload  is  a key  component  because  it 
influences  the  human  operators*  performance. 

The  relationship  between  human  performance  and 
mental  workload  is  often  represented  as  similar  to 
the  Yerkes-Dodson  (1908)  inverted-U  relationship 
as  shown  in  Fig  1 . As  Fig  1 indicates  when  mental 
workload  is  very  low  human  performance  will 
decline. 


Figure  1 Inverted-U  relationship  between 
workload  and  performance  (modified  from  Yerkes  & 
Dodson,  1908). 


As  workload  increases  so  does  human 
performance.  However,  at  some  point  workload 
transitions  to  a level  high  enough  to  overload 
human  mental  resources  {Wtckens,  2008).  To 
manage  the  high  workload,  humans  employ 
strategies  to  reduce  workload  to  manageable 
levels.  These  strategies  are  called  workload 
ma  nag  e m ent  stra  teg  i es  ( Li  ttl  e , 1 99 3) . A s trateg  y , 
for  example,  might  be  to  stop  an  ongoing  task, 
ignore  a new  task  or  to  perform  concurrent  tasks 
sequentially.  All  of  these  workload  management 
strategies  can  result  in  performance  decrements. 

For  over  a decade  human  performance  researchers 
(Colle  & Reid,  2005;  Rueb,  Vidulich,  & Hassoun, 
1994,  Reid  8t  Colle,  1988;  Schlegel,  B.,  Schiegel, 

R,,  & Gilliland,  1988;  Grier,  Wickens,  Kaber. 

Strayer,  Boehm-Davis,  Trafton,  6c  St.  John,  2008) 
have  attempted  to  refine  the  inverted-U 
representation  of  workload  by  identifying  the  point 
where  workload  and  performance  transition  from 
acceptable  to  unacceptable.  They  refer  to  this 
transition  point  as  the  workload  redline  or  threshold 
(Grier,  et  al,  2008),  Identifying  this  workload 
threshold  is  important.  If  it  could  be  determined, 
then  human  factors  researchers  could  establish  a 
workload  level  that  is  considered  acceptable  for 
optimum  human  performance.  System  engineers, 
in  turn,  could  use  this  workioad  guidance  to  help 
ensure  their  system  designs  provide  effective 
human  performance.  Despite  the  many  years  of 
research,  there  is,  however,  no  consensus  among 
researchers  on  a workload  threshold. 

A range  of  workload  threshold  values  have  been 
proposed  by  researchers  who  used  the  subjective 
workload  assessment  tool  (SWAT)  to  estimate 
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workload.  These  researchers  have  proposed 
SWAT  threshold  values  in  the  range  of  50 
(Colle  & Reid,  2005;  Rueb,  Viduiich,  & Hassoun, 
1994;  Retd  & Colle,  1988;  Schlegel,  B..  Schlegel, 

R,  & Gilliland,  1988).  The  SWAT  workload  range 
is  useful  for  system  engineers  conducting  system 
evaluations.  In  these  evaluations  human 
participants  can  give  self-report  workload  ratings 
which  SWAT  requires. 

Not  all  evaluations  of  system  designs,  however, 
include  human  participants  that  can  give  self-report 
workload  ratings.  Human  performance  modeling, 
for  example,  is  an  effective  technique  for  evaluating 
system  designs  that  includes  mental  workload 
evaluation  but  does  not  include  human  participants 
(Mitchell,  2000).  The  human  operators  are 
simulated  in  human  performance  models  and, 
therefore,  self-report  workload  scales,  such  as, 
SWAT  cannot  be  used.  Using  human  performance 
modeling,  however,  has  several  advantages  over 
techniques  that  use  human  participants. 

Human  performance  modeling  is  particularly  useful 
early  in  the  system  development  phase  when 
finding  a representative  sample  of  human  users  of 
the  proposed  system  can  be  costly  and  challenging 
due  to  funding  constraints.  In  addition  modeling 
can  be  used  when  a representative  sample  of  users 
is  unavailable  or  only  a small  sample  size  of  users 
is  available.  Finally,  it  is  useful  when  the  design  is 
still  a concept  and  no  system  mock-ups  exist.  For 
human  performance  modeling  techniques  that 
include  mental  workload  prediction  as  part  of  the 
system  design  evaluation  a workload  threshold 
remains  critical. 

Human  performance  modelers  at  the  Army 
Research  Laboratory  have  developed  an  analytical 
approach  for  establishing  a workload  threshold  they 
can  use  for  evaluation  of  a proposed  system 
design.  Their  technique  includes  three  key 
components.  First,  they  create  a scenario 
containing  segments  with  each  segment 
representing  events  that  change  the  goals  of  the 
operators  of  the  system.  Second,  they  establish  a 
baseline  they  can  use  for  workload  and 
performance  comparisons.  Finally,  for  each  of 
these  segments,  they  select  unique  workload 
threshold  values  for  each  operator  who  will  operate 
the  system. 

In  2009,  the  ARL  modelers  implemented  this 
approach  in  an  analysis  of  the  impacts  of  two 
conceptual  technologies  on  the  workload  and 
performance  of  a tank  crew  (Mitchell,  in  review). 


2.0  CASE  STUDY 

To  implement  their  approach,  the  ARL  modelers 
used  the  human  performance  modeling  tool. 
IMPRINT  (Improved  Performance  Research 
Integration  Tool;  http : //www.  arl . a rm  y.  m ll/l  M PR  I NT) . 
IMPRINT  is  a stochastic  task-  network  modeling 
tool  that  provides  modelers  with  the  capability  to 
simulate  humans  performing  tasks.  The  humans 
simulated  for  this  project  were  the  tank 
crewmembers.  Specifically,  the  ARL  modelers  built 
a model  simulating  the  tasks  performed  by  each 
crewmember  of  a baseline  tank.  Next,  they  built  a 
model  to  represent  the  tasks  performed  by  the  tank 
crewmembers  when  the  vehicle  design  was 
enhanced  to  include  a driver's  aid  and  a loader's 
situation  awareness  display. 

In  addition  to  simulating  task  performance, 

IMPRINT  also  provides  modelers  with  the  capability 
to  predict  the  mental  workload  associated  with  the 
tasks  individuals  perform  {Mitchell,  2000).  The  ARL 
modelers  used  this  mental  workload  option  to 
predict  the  mental  workload  of  the  crewmembers  of 
the  baseline  tank  as  well  as  the  enhanced  tank. 

The  theoretical  basis  for  the  IMPRINT  mental 
workload  option  is  Multiple  Resource  Theory  (MRT) 
(Wickens,  2008). 

According  to  MRT,  the  capacity  of  human  mental 
resources  is  limited.  Therefore,  as  an  individual 
performs  a task,  the  task  makes  demands  upon 
these  limited  mental  resources.  Furthermore,  when 
an  individual  performs  two  or  more  tasks 
concurrently,  all  the  concurrent  tasks  demand  some 
of  the  individual’s  mental  resources.  Because  the 
mental  resources  have  limits,  the  demands  of  the 
concurrent  tasks  may  exceed  or  overload  the 
individual's  resources.  The  point  where  the 
individual's  resources  are  overloaded  is  the 
workload  threshold.  When  this  threshold  is 
exceeded,  the  individual  implements  workload 
management  strategies  which  cause  the 
individual's  performance  to  decline. 

Because  the  IMPRINT  workload  capability  is  based 
on  MRT,  its  workload  predictions  are  task-based 
predictions.  Changes  in  the  tank  crewmembers 
workload,  therefore,  are  related  to  changes  in  the 
tasks  they  perform  in  the  baseline  versus  modified 
tank.  If  the  technologies  in  the  modified  tank 
reduce  crew  workload  then  the  IMPRINT  workload 
predictions  should  be  lower  for  the  modified  versus 
baseline  tank  model  runs. 


702 


The  IMPRINT  tool  implements  MRT  by  providing 
modelers  with  the  capability  to  enter  the  mental 
resources  required  by  each  task  for  the  human 
operators  of  a proposed  system.  Furthermore,  it 
provides  numerical  values  for  estimating  the 
demands  of  the  operators’  tasks  on  their  mental 
workload.  IMPRINT  provides  these  numerical 
values  in  the  form  of  scales  There  are  seven 
scales,  one  for  each  resource.  The  resources 
represented  by  the  seven  scales  are  visual, 
auditory,  cognitive,  fine  motor,  gross  motor,  tactile, 
and  speech. 

Using  the  workload  scales  in  IMPRINT,  the  ARL 
modelers  selected  the  appropriate  values  for  each 
of  the  resources  that  a tank  crewmember  used  for 
each  task.  The  IMPRINT  software  aggregated 
these  workload  inputs  across  all  the  tasks  the 
crewmember  performed  every  time  a new  task  was 
started.  IMPRINT  then  provided  an  overall 
workload  score.  This  overall  workload  score  is 
compared  to  a workload  threshold  set  by  the 
modelers.  If  the  overall  workload  number 
exceeded  the  threshold  than  a workload 
management  strategy  is  triggered  within  the  model. 
Modelers  can  then  see  the  impact  of  the 
crewmember’s  workload  on  performance  with  the 
system.  Because  the  workload  threshold  is  the  key 
to  determining  if  a workload  management  strategy 
is  employed,  it  was  critical  for  the  IMPRINT 
modelers  to  select  an  appropriate  workload 
threshold  for  the  tank  crew  in  their  analysis. 

As  the  first  step  in  identifying  a workload  threshold 
for  the  tank  crew  analysis,  the  ARL  modelers 
selected  a scenario  to  model  with  IMPRINT,  For 
the  performance  to  be  representative  of  the  typical 
tank  crew,  the  scenario  needed  to  be  one  in  which 
the  crew  performed  the  majority  of  their  common 
tank  crew  tasks.  These  common  tank  crew  tasks 
are  driving,  communicating,  searching  for  targets, 
and  engaging  targets  (Directorate  of  Training, 
Doctrine,  and  Combat  Development  Field  Manual 
3-20.15,  2007), 

The  ARL  modelers  needed  to  include  common 
crew  tasks  in  the  scenario  because  they  would 
build  the  tank  crew  tasks  into  the  IMPRINT  model 
based  on  the  scenario.  It  was  critical  for  the 
IMPRINT  workload  analysis  to  be  valid  that  the 
crew  be  performing  all  the  tasks  the  technologies 
might  influence  within  the  model.  Furthermore,  it 
was  especially  important  for  the  ARL  modelers  to 
include  in  the  scenario  those  common  crew  tasks 
the  crewmembers  perform  concurrently.  The 


inclusion  of  concurrent  tasks  in  the  models  was 
important  because  workload  is  typically  higher,  and 
performance  is  typically  lower,  for  concurrent  tasks 
than  sequential  tasks  {Just,  Carpenter,  Keller, 
Emery,  Zajac,  & Thulborn,  2001).  To  meet  these 
scenario  characteristics,  the  modelers  selected  a 
movement  to  contact  mission  (Directorate  of 
Training,  Doctrine,  and  Combat  Development  Field 
Manual  3-20.15,  2007). 

After  selecting  the  scenario,  the  ARL  modelers 
divided  the  mission  into  segments  that  represented 
changes  in  the  crewmembers'  goals.  For  example, 
as  a movement- to-contact  mission  begins,  the 
crewmembers’  goal  is  to  detect  the  enemy, 
whereas,  once  they  detect  the  enemy  their  goal 
shifts  to  destroying  the  enemy.  As  a consequence 
of  the  shift  in  goals  between  the  two  segments,  the 
crew  performs  different  tasks.  Because  the 
workload  predictions  in  the  IMPRINT  model  are 
based  on  task  demands,  the  crews’  workload  will 
change  along  with  the  tasks.  Therefore,  if  the 
crewmembers  perform  a unique  set  of  tasks  in  one 
segment  than  another  segment,  it  is  reasonable  to 
assume  that  their  workload  will  be  very  different 
from  one  segment  to  another.  For  example,  in 
mission  segments  during  which  the  tank  is 
stationary,  the  driver  could  engage  a target.  In 
contrast,  when  the  vehicle  is  moving,  the  driver  is 
driving  and  would  not  be  engaging  targets.  The 
segments  the  ARL  modelers  selected  to  represent 
diverse  sets  of  crewmember  tasks  for  the 
movement-to-contact  mission  were:  movement  to 
contact  begins,  move  via  checkpoints  to  the  line-of- 
departure,  precision  engagement,  and  move  to 
defensive  position. 

As  they  begin  the  movement-to-contact  mission, 
the  goal  of  the  crewmembers  is  to  be  ready  for  the 
mission.  They  perform  workstation  and 
communications  equipment  set-up.  As  they  move 
via  checkpoints,  their  goals  shift  to  searching  for 
potential  enemy.  They  communicate,  drive,  search 
for  threats,  track  the  battle  and  do  hasty  planning. 
After  the  enemy  is  detected,  their  goals  shift  again 
to  destroying  the  threat.  They  identify,  engage,  and 
destroy  the  threat.  Finally,  after  the  enemy  is 
eliminated,  their  goals  shift  to  avoiding  detection  by 
opposing  forces.  They  back-up  the  vehicle  and 
drive  quickly  to  a defensive  position  while  avoiding 
enemy  detection. 

For  each  of  these  segments,  the  ARL  modelers  set 
a unique  workload  threshold.  Each  threshold  was 
unique  because  of  the  variation  in  tasks,  and, 


703 


therefore,  workload  in  each  mission  segment. 
Furthermore,  each  crewmember  needed  a unique 
workload  threshold  because  the  crewmembers 
performed  very  different  combinations  of  tasks.  For 
example,  the  PL  does  tactical  planning, 
communications  monitoring,  and  supervisory  tasks 
while  the  driver  drives  the  vehicle.  They  obtained 
the  threshold  values  for  each  crewmember  for  each 
segment  from  an  existing  baseline  IMPRINT  tank 
model.  Mitchell  (2009)  describes  this  model  and 
the  steps  the  ARL  analyst  followed  in  its 
development  in  detail. 

After  developing  the  mission  segments  and 
selecting  thresholds,  the  ARL  analysts  included  the 
segments  in  their  task-network  models.  In  the 
IMPRINT  task-  network  models,  the  ARL  modelers 
represented  the  sets  of  tasks  the  crewmembers 
performed  in  each  segment  of  the  scenario  as 
functions.  Driving,  scanning  for  threats,  and 
communications,  for  example,  would  be  functions  in 
the  model.  Furthermore,  the  task-network  model  is 
hierarchical  which  means  functions,  at  the  higher 
level,  can  be  decomposed  into  smaller  units  called 
tasks.  Thus,  the  ARL  modelers  decomposed  the 
functions  in  each  segment  into  tasks.  Examples  of 
tasks  for  the  driving  function  would  be  maintain 
speed,  adjust  steering,  monitor  forward  terrain,  etc. 

After  creating  the  hierarchical  task-network  of 
functions  and  tasks  for  each  crewmember  in  each 
segment  of  the  scenario,  the  ARL  modelers 
identified  the  interfaces  or  equipment  the 
crewmembers  used  to  perform  the  tasks.  IMPRINT 
provides  modelers  with  the  capability  to  enter  the 
list  of  interfaces  used  by  the  human  system 
operators  for  each  task.  Thus  the  ARL  modelers 
entered  the  list  of  interfaces  each  crewmember 
used  for  each  task  into  the  baseline  tank  model. 
Then,  using  the  IMPRINT  workload  scales,  the 
modelers  estimated  the  demands  that  each  task 
and  interface  combination  placed  upon  the  each 
crewmember’s  mental  resources  (visual,  auditory, 
cognitive,  gross  motor,  fine  motor,  speech,  or 
tactile). 

Once  the  workload  data  was  entered,  the  ARL 
modelers  ran  the  baseline  tank  model  multiple 
times.  The  multiple  runs  represented  all  the 
possible  combinations  of  functions  and  tasks  that 
the  crewmembers  performed  during  each  segment 
of  the  mission.  Based  on  these  runs  the  modelers 
then  identified  for  each  crewmember  in  each 
mission  segment,  the  combination  of  tasks  that  had 
the  highest  overall  workload  value.  In  addition, 


they  calculated  the  average  workload  across  all  the 
runs  for  each  crewmember  for  each  segment.  The 
maximum  workload  value  and  average  workload 
value  became  the  workload  threshold  for  that 
crewmember  for  that  mission  segment  for  the 
baseline  model. 

The  ARL  modelers  then  modified  the  baseline 
model  to  represent  the  crewmembers  performing 
the  tasks  with  the  two  proposed  technologies. 
Specifically,  they  modified  the  interfaces  used  by 
two  of  the  tank  crewmembers,  the  driver  and  the 
loader.  Because  the  interfaces  for  these  two 
crewmembers  were  modified  from  the  baseline,  the 
ARL  modelers  needed  to  modify  the  tasks  these 
two  crewmembers  performed.  For  example,  in  the 
baseline  model,  a crewmember  needed  to  open  the 
hatch  to  do  a specific  task  while  in  the  modified 
model  the  loader’s  display  enabled  the  loader  to 
perform  with  a closed  hatch. 

When  the  modified  model  was  complete,  the  ARL 
modelers  ran  it  multiple  times  and  calculated  the 
same  workload  measures  as  they  had  for  the 
baseline  model.  They  then  compared  the  two 
models  to  see  if  the  crew  workload  in  the  modified 
tank  model  was  higher  than  the  threshold  value 
established  from  the  baseline  model.  If  the 
workload  was  the  same  or  lower,  they 
recommended  the  technologies  for  further  testing. 

If  it  was  higher  than  the  baseline  they 
recommended  evaluating  if  the  potential  for 
overload  was  mitigated  by  an  increase  in 
performance. 

In  addition,  to  the  workload  comparison,  the  ARL 
modelers  compared  the  performance  of  the 
crewmembers  in  the  two  models.  For  example,  the 
loader’s  workload  may  have  remained  the  same  for 
both  models  but  the  technology  may  have 
increased  his  performance  by  permitting  him  to  do 
surveillance  buttoned-up  rather  than  out-of-the- 
hatch.  Furthermore,  a crewmember  performing 
with  an  open  hatch  is  at  a greater  risk  of  injury  than 
with  a closed  hatch.  Greater  risk  of  crew  injury,  in 
turn,  for  represent  a great  risk  to  crew  survivability 
and,  therefore,  the  overall  movement-to-contact 
mission. 

The  overall  conclusion  of  the  analysis  was  that  the 
new  technologies  did  have  the  potential  to  increase 
mission  performance  while  reducing  crew  workload. 
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3.0  DISCUSSION 

The  ARL  modelers  found  the  practical  approach  to 
establishing  the  threshold  more  effective  for 
justifying  their  recommendations  for  system  design 
changes  than  other  threshold  techniques  they  had 
used  over  the  past  decade,  In  earlier  efforts,  the 
modelers  had  used  a single  overall  workload  value 
of  60  (Mitchell,  Samms,  Henthorn  & WojclechowskI, 
2003)  or  40  (Mitchell,  2005)  or  7 (McCracken  & 
Aldrich,  1984)  as  the  workload  threshold  value. 
Although  these  projects  changed  system 
requirements  (Mitchell  & Samms,  2009),  and  the 
results  were  replicated  in  experiments  (Chen, 

2009),  the  selection  of  a single  workload  threshold 
for  all  crewmembers  across  a scenario  was 
challenging  for  the  ARL  modelers  to  defend. 
Because  the  threshold  was  difficult  to  defend,  it 
made  it  difficult  to  convince  system  engineers  to 
change  designs.  The  single  threshold  value  was 
difficult  to  defend  because  the  overall  workload 
values  from  IMPRINT  could  vary  widely  between 
crewmembers  due  to  variations  in  the  functions  and 
tasks  they  perform.  The  driver  of  the  tank,  for 
example,  might  have  a maximum  workload  value  of 
200,  in  contrast  to  the  loader  who  has  a maximum 
workload  value  of  60.  Thus,  with  a single  threshold 
value  of  40,  both  crewmembers  would  be 
overloaded  in  the  baseline  but  one  would  have  a 
much  higher  workload  value  than  the  other.  In  this 
situation,  the  crewmember  with  the  workload  that 
exceeded  threshold  by  the  most  would  be  most 
likely  to  be  the  focus  for  system  design  changes.  In 
comparison,  by  identifying  a threshold  for  each 
crewmember,  the  ARL  modelers  had  more 
capability  to  focus  attention  equally  across 
crewmembers  and  influence  system  design 
changes  for  each  crewmember. 

Another  challenge  confronting  the  ARL  modelers 
was  that  the  functions,  tasks,  and  workload  that  a 
single  crewmember  performed  changed 
significantly  from  segment  to  segment.  For 
example,  the  highest  workload  value  in  an 
IMPRINT  model  for  a tank  driver  within  a mission 
might  be  200  and  the  average  across  the  mission 
might  be  100.  This  high  workload  is  associated 
with  the  driving  function  and  tasks,  The  workload, 
therefore,  would  not  be  representative  of  the 
driver’s  workload  when  the  platform  Is  stationary. 
During  this  mission  segment,  the  driver’s  workload 
would  be  31,  a much  lower  value  because  the 
driver  is  not  driving  but  is  scanning  for  threats.  If 
the  mission  were  not  divided  into  segments  this 
difference  in  workload  would  not  be  apparent 


because  average  workload  across  the  mission 
would  be  1 15.5  and  high  workload  200.  The 
practical  threshold  approach  solves  this  problem  by 
divided  the  operational  scenario  into  segments 
representing  changes  in  functions  and  tasks  for 
crewmembers  and  the  associated  workload  value 
changes. 

4.0  CONCLUSION 

ARL  modelers  recommend  the  practical  approach 
to  setting  a workload  threshold  be  used  to  evaluate 
system  designs.  Although  they  Implemented  their 
approach  with  the  human  performance  modeling 
tool,  the  practical  threshold  approach  can  be 
applied  to  any  workload  analysis.  Researchers 
conducting  experiments  or  field  tests  can  use  the 
same  approach.  To  implement  the  approach  in  an 
experiment,  the  researcher  would  create  two 
experimental  conditions:  a baseline  and  a design 
alternative.  Next  they  would  Identify  a scenario 
which  includes  all  the  goals  of  the  participants  with 
the  system.  They  would  divide  this  scenario  into 
the  segments  that  represent  these  goals.  They 
would  then  have  the  test  participants  perform  all 
their  representative  concurrent  interactions  with  the 
system  in  each  segment.  They  would  collect 
workload  values  during  both  the  control  and 
alternative  design  condition  for  each  segment  and 
compare  workload  and  performance  of  the 
participants  in  the  two  conditions.  They  would  then 
make  recommendations  based  on  the  workload 
comparisons. 

As  a result  of  this  analysis,  two  enhancements  to 
IMPRINT  were  recommended.  When  the  ARL 
modelers  analyzed  the  results  across  each  mission 
segment,  they  used  the  Function  Performance 
report.  The  Function  Performance  report  provides 
analysts  with  detailed  information  on  function 
duration,  accuracy  and  frequency.  This  report  is 
generated  by  looking  at  all  the  functions  in  the 
model  that  have  started  and  finished  during  the 
model  execution  but  does  not  report  instances 
where  functions  are  stopped  or  interrupted.  The 
same  is  true  for  the  Task  Performance  report  that 
reports  similar  information  but  at  the  task  level. 
Expanding  these  reports  to  include  data  about 
function  or  task  stops  and  interrupts  will  provide 
more  detailed  results  to  the  analyst. 

Another  reoommended  enhanoement  was  to  allow 
analyst  to  choose  at  what  level  they  would  like  to 
define  workload  thresholds;  at  the  function  or 
mission  segments  level  or  at  the  overall  mission 
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level.  Currently,  workload  thresholds  are  set  per 
operator  over  the  length  of  the  entire  mission. 

There  may  be  times  where  different  segments  of  a 
mission  may  have  different  workload  thresholds. 
Implementing  this  capability  in  IMPRINT  would 
allow  the  analyst  more  flexibility  in  exploring  new 
workload  theories.  These  enhancements  will  be 
considered  for  implementation  in  the  next  IMPRINT 
development  cycle. 
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Mental  Workload  and  Human 
Performance 


Inverted-U  relationship  between  workload  and  performance 


Modified  from  Yerkes,  R.  M.  & Dodson,  J D.  (1906).  The  relation  of  strength  of  stimulus  to  rapidity  of  habit-formation. 
Journal  of  Comparative  Neurology  and  Psychology,  IS,  459’462. 


■ Importance  of  Workload 

• Indicator  of  problem  areas  within  system  design 

• Peaks  and  valleys  of  workload  indicate  times  when 
human  performance  may  suffer,  e.g.: 

- Sustained  low  workload  (underload)  leads  to  boredom, 
loss  of  situation  awareness,  and  reduced  alertness. 

- Sustained  high  workload  (overload)  leads  to  fatigue. 

- Workload  peaks  lead  to  dropped  tasks,  increased  task 
time,  cognitive  tunneling,  and  increased  errors. 

• Reduces  crew  performance,  system  performance, 
and  contribute  to  mission  failure 

OBJECTIVE:  Achieve  evenly  distributed,  manageable 

workload.  Avoid  both  overload  and  underload. 
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Why  Human  Performance 
Modeling  (HPM)? 


Concept  System 

Many  Variables 

Field  Study  Not  Feasible 

Too  Dangerous 


Model  - Test  - Model 

System  Performance  = /(human  performance) 


Improved  Performance 
Research  Integration  Tool 


impnjyfd  Pnff4r«ifl«  fctHWch  mtef nlwn  1h( 


334  users  supporting  Army,  Navy  Air  Force, 
Marines,  NASA,  Department  of  Homeland 
Security  (DHS),  Department  of  Transportation 
(DoT),  Joint  and  other  organizations 
across  the  country 
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IMPRINT  can  be  used  to 
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• Set  realistic  system 
requirements 

• Identify  future  manpower  & 
personnel  constraints 

• Evaluate  operator  & crew 
workload 

• Test  alternate  system-crew 
function  allocations 

• Assess  required 
maintenance  man-hours 

• Assess  performance  during 
extreme  conditions 


• Examine  performance  as  a 
function  of  personnel 
characteristics  and  training 
frequency  & recency 

• Identify  areas  to  focus  test  and 
evaluation  resources 

• Quantify  human  system 
integration  risks  in  mission 
performance  terms  to  support 
milestone  review 

• Represent  humans  in 
federated  simulations 


IMPRINT  is  a trade-off  analysis  tool 
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Multiple  Resource  Theory 
(MRT)  in  IMPRINT 


Mission 

Tasks 


Which  Brain 
Resources 
Involved? 


Degree  of 
Resource  Use? 


1,  monitor 
alarms 

2.  decide 
response 
action 

X pull  trigger 


n.  taskn 


Speech 

Visual 


Auditory 

Motor 


O.D 

1.0 

1.2 

3.7 

4.6 


Cognitive 

No  Cognitive  Activity 
Automat ic  (simple 
association) 

Alternative  Selection 
Sign^ignal  Recognitpon 
Evaluation^Judgment 
(consider  single  aspect) 
Encodi  ngi'[>ecod  i ng , 
Recall 

Eveluation/J  udg  ment 
(consider  several 
aspects) 

Estimation,  Calculation, 
Conversion 


S 
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“ Analysis  Approach 

Quantify  influence  of  human  operator  performance  on  system/mission  performance 


Soldper  performance  mcludes  mission  analysis  Mission  relevant  performanoe  parameters 
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Executing  the  Approach 


Run  Analysis 

• Compare  workload 
results  across  conditions 

• Higher  workload  than 
baseline  = performance 
decrements 


Build  Models 

Create  a scenario  with 
segments  representing  events 
that  change  the  goals  of 
system  operators 
Establish  baseline  and 
alternative  system  design 
Select  unique  workload 
threshold  values  for  each 
operator 
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Case  Study 


Examine  impact  of  two  conceptual  technologies  on 
workload  and  system  performance 

BASELINE  model 

Without  technologies 


ALTERNATIVE  model 

With  technologies 


Movement  to  Contact 

Seek,  identify  and  eliminate  potential  threats 


New  technologies  have  fjotential  to  increase  m/ss;on  performance  while 

reducing  crew  workload 


Mildiell,  D.  K.  [in  press),  Abrsms  V2  SEP  Crew  Workload  Ana^sis:  Irrpacte  cjf  Two  Proposed  TechrtotogcK,  U-S,  Amy  Researdi  Laboratory,  Aberdeen  Proving  GrourdK  MD, 


.15.“  “ S u m mary 

• Use  analysis  approach  to  setting  workload 
thresholds  in  HPM  or  experimentation 

• Develop  overarching  scenario 

• Set  up  at  least  two  conditions;  e.g.  baseline  & 
alternative 

• Compare  workload  levels 

• Make  recommendations  based  on  workload 
comparisons 

• Potential  enhancements  for  IMPRINT 

• Expansion  of  function  & task  performance  reports 

• Function  level  workload  thresholds 
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5.12  Simulating  Visual  Attention  Allocation  of  Pilots  in  an  Advanced 
Cockpit  Environment 


Simulating  Visual  Attention  Allocation  of  Pilots  in  an  Advanced 

Cockpit  Environment 

F.  Frisch e & J,-P.  Osterloh  & A.  Ludtke 
OFFIS  Institute  for  Information  Technology 
frische^offis.de  osterloh&offis.de  luedtke&>offis.de 


This  paper  describes  the  results  of  experiments  conducted  with  human  line  pilots  and  a cognitive  pilot  model  during  interaction  with 
a new  4D  Flight  Management  System  (FMS).  The  aim  of  these  experiments  was  to  gather  human  pilot  behavior  data  in  order  to 
calibrate  the  behavior  of  the  model.  Human  behavior  is  mainly  triggered  by  visual  perception.  Thus,  the  main  aspect  was  to  setup  a 
profile  of  human  pilots’  visual  attention  allocation  in  a cockpit  environment  containing  the  new  FMS.  We  first  performed  statistical 
analyses  of  eye  tracker  data  and  then  compared  our  results  to  common  results  of  familiar  analyses  in  standard  cockpit 
environments.  The  com  pa  risen  has  shown  a significant  influence  of  the  new  system  on  the  visual  performance  of  human  pilots. 
Further  on,  analyses  of  the  pilot  models'  visual  performance  have  been  performed.  A comparison  to  human  pilots'  visual 
performance  revealed  important  improvement  potentials. 


1.0  INTRODUCTION 

The  European  project  HUMAN  (EC’s  7th 
Framework  Programme)  aims  at  developing 
virtual  test  pilots,  in  order  to  improve  the 
human  error  analysis  of  future  cockpit 
systems  in  early  design  phases,  as  a 
supplement  of  simulator  tests  with  human 
pilots  in  later  design  phases.  In  HUMAN,  a 
4D  Flight  Management  System  (Advanced 
Flight  Management  System,  AFMS)  and  its 
user  interface  (Airborne  Human  Machine 
Interface,  AHMI),  developed  at  the  German 
Aerospace  Center  (DLR  Braunschweig), 
have  been  selected  as  systems  under 
Investigation.  The  virtual  test  pilots  are 
instances  of  a cognitive  architecture  named 
CASCaS  (Cognitive  Architecture  for  Safety 
Critical  Task  Simulation,  see  [12]).  Cognitive 
architectures,  such  as  ACT-R  (see  [1]), 
SOAR  (see  [10]),  MIDAS  (see  [4])  and 
CASCaS  implement  cognitive  plausible 
theories  for  human  perception,  memory 
operations  and  decision  making.  These 
theories  are  independent  of  specific  human- 
machine  interfaces.  Thus,  cognitive 
architectures  are  applicable  not  only  in  the 
aviation  domain,  but  also  in  the  automotive 
or  maritime  domain.  Perception  of  system 
and  environmental  states  - or  of  entities  in 
the  real  world  In  general  - is  a key  factor  for 
situation  awareness  and  for  decision 
making  [6].  The  main  channels  for 
perception  on  human-machine  interfaces 


are  primarily  eyes  tor  visual  perception  ana 
secondarily  ears  for  auditory  perception.  A 
third  upcoming  channel  is  the  skin  for  tactile 
interfaces,  but  this  is  - to  our  knowledge  - 
currently  not  implemented  in  any  of  the 
cognitive  architectures  mentioned  before. 
Due  to  the  importance  of  visual  perception 
for  human-machine  interaction,  and  for 
situation  awareness  and  decision  making, 
there  is  a need  for  an  accurate  simulation  of 
visual  performance  in  cognitive 
architectures. 

Introduction  of  new  user  interfaces,  e.g.  into 
common  cockpit  setups,  has  influence  on 
the  visual  attention  allocation.  Examples  for 
this  effect  can  be  found  in  [7].  This  could  be 
explained  by  the  following  two  points:  On 
the  one  hand,  new  interfaces  can  trigger 
attention  bottom-up,  meaning  that  the 
interface  presents  information  in  a very 
dominant  way  which  distracts  visual 
attention  from  other  interfaces.  This  is  often 
referred  to  as  selective  attention,  where  eye 
movements  and  shifts  of  attention  are 
triggered  by  the  onset  of  a salient  stimulus 
[16].  On  the  other  hand,  attention  allocation 
can  be  affected  top-down  because  the  new 
interface  provides  new  functionality  or 
displays  redundant  information  in  a more 
accessible  or  usable  way  than  other 
interfaces  do.  Top-down  attention  is  caused 
primarily  by  underlying  task  models  that 
comprise  the  allocation  of  visual  attention. 
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Thus,  a cognitive  architecture  that  shouid 
simulate  visual  attention  allocation 
humanlike  requires  both,  valid  cognitive 
theories  for  bottom-up  attention  and  a valid 
task  model  embedding  tasks  on  the  new 
interface  into  the  common  task  model  for 
top-down  attention. 

In  this  paper  we  will  present  results  of 
experiments  with  human  line  pilots  and  a 
pilot  model  interacting  in  a cockpit 
environment  containing  the  AHMI.  Although 
the  datasets  have  also  been  used  to 
validate  the  cognitive  theories  for  visual 
attention  allocation  implemented  in  the 
model,  the  main  focus  of  this  paper  is  the 
validation  of  our  task  model  for  scanning 
activities  in  the  new  cockpit  setup. 

In  the  following  section  we  describe  top- 
down  and  bottom -up  concepts  for  visual 
attention  implemented  in  CASCaS  (section 
2).  Then,  the  experiments  conducted 
(section  3)  and  the  results  of  these 
experiments  are  presented  (section  4).  The 
paper  closes  with  a short  discussion 
(section  5)  and  conclusions  (section  6), 

2.0  MODELING  VISUAL  ATTENTION 

Visual  attention  allocation  is  a complex 
conglomerate  of  top-down  (active)  and 
bottom-up  (reactive)  processes  triggering 
percept  actions.  Top-down  and  bottom-up 
attention  compete  against  each  other  [3], 
e.g.  a salient  stimulus  might  distract  pilots 
from  tasks  which  they  are  focused  on.  This 
is  often  intended,  e.g.  in  case  of  warnings. 
However,  a salient  stimulus  might  go 
undetected,  because  top-down  attention 
causes  the  eyes  to  move  to  an  area  of 
interest  where  the  stimulus  is  either  out  of 
the  visual  field  or  absorbed  by  a dynamic 
neighborhood.  The  cognitive  architecture 
CASCaS  implements  both  processes.  In  the 
following  subsections  we  will  describe  how 
top-down  and  bottom-up  processes  have 
been  implemented. 

2.1  Top-Down  Attention 

The  top-down  attention  is  driven  via  three 
different  levels  of  consciousness  (see 


Fig.1),  which  are  based  on  Anderson’s  three 
layers  of  consciousness  named 
autonomous  layer,  associative  layer  and 
cognitive  layer  [2].  This  is  also  in  line  with 
Rasmussen,  who  defined  three  levels  of 
behavior,  called  skill-based,  rule-based  and 
knowledge-based  [13].  While  nearly  zero 
consciousness  is  needed  on  the 
autonomous  layer,  almost  full 
consciousness  is  needed  on  the  cognitive 
layer,  where  decision  making,  planning  and 
problem  solving  are  located. 


Fig.1;  The  multi-layered  architecture  of 
CASCaS  consists  of  components  for 
perception,  memory,  knowledge  processing 
and  motor  actions. 

T op-down  processes  on  the  associative 
layer  are  the  main  driving  factor  for  visual 
attention  allocation  of  pilots,  where  they 
perform  well-learned  rules  to  achieve 
specific  goals.  These  rules  describe 
normative  procedures  - percept  and  motor 
actions  that  match  correctly  specific 
situations.  With  regard  to  visual  attention  of 
pilots  we  differentiate  between  two  types  of 
procedures:  (1)  scanning  procedures  and 
(2)  interaction  procedures.  Scanning 
procedures  only  contain  percept  actions. 
Pilots  regularly  perform  scanning  of  multiple 
aircraft  and  environment  parameters  in 
order  to  keep  situation  awareness  for 
current  and  future  aircraft  states.  These 
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scanning  activities  are  the  main  driving 
factor  for  visual  attention  in  our  pilot  model. 
Interaction  procedures  contain  percept  and 
motor  actions.  They  are  used  to  interact 
with  interfaces  in  the  aircraft,  such  as  the 
AHMl.  Percept  actions  are  needed  in  order 
to  assess  current  situations  and  because 
we  assume  that  pilots  look  at  buttons  before 
they  press  them. 

In  CASCaS,  normative  procedures  are 
described  by  formal  rules.  The  rule  format  is 
a Goal-State-Means  (GSM)  format  (see  Fig. 
2).  All  rules  consist  of  a left-hand  side  (LHS) 
and  a right  hand  side  (RHS).  The  left-hand 
side  contains  a goal  in  the  Goal-Part  and  a 
State-Part  specifying  boolean  conditions  on 
the  current  state  of  the  environment  in  the 
memory.  Apart  from  the  condition  the  State- 
Part  contains  memory- read  operators  to 
specify  that,  in  order  to  evaluate  a condition, 
the  associated  values  Vj  of  interaction 
elements  /}  have  to  be  retrieved  from 
memory.  The  right-hand  side  consists  of  a 
Means- Part  containing  motor  and  percept 
operators  (writing  values  and  reading  values 
in  the  simulated  environment),  memory- 
store  operators  as  well  as  a set  of  partially 
ordered  sub-goals. 




^ I Memory-read(ii,  vi) 
j Memory-read(i2,  V2) 
|Cond(vi  > V2) 


Rules  are  connected  by  a goal  on  the  left- 
hand  side  and  goals  on  the  right-hand  side. 
This  allows  us  to  break  down  complex 
procedures  into  a hierarchical  ordered 
structure,  similar  to  hierarchical  task  trees 
with  a small  difference:  Our  rule-based 
description  of  procedures  permits  transitions 
from  lower  levels  to  higher  levels  (see  Fig. 
3).  Each  procedure  consists  of  1...n  goals. 
Each  goal  is  a unique  entity  that  is  allocated 
to  1...m  rules  but  each  rule  is  allocated  to 
exactly  one  goal. 


(...)  (...)  (...) 


Fig.  3:  Rules  are  connected  by  goals  on  the 
LHS  and  RHS 


Motor(i3,  V3) 
w Percept(i4,  V4) 
a:  GoaKgz) 
Goal(g3) 


1 


Means-Part 


Fig.  2:  Procedural  knowledge  is  described  is 
a specific  rule  format  that  consists  a certain 
goal  in  a Goal-part,  a State-part  and  a Means- 
part 


During  simulation  the  cognitive  architecture 
selects  rules  based  on  their  left-hand  sides 
and  executes  the  right-hand  sides. 


2.2  Bottom-Up  Attention 

Bottom-up  processes  are  unconscious  and 
triggered  by  the  perceptual  component  of 
CASCaS.  The  main  driving  factor  for 
bottom-up  attention  in  CASCaS  is  a theory 
called  selective  attention.  Selective  attention 
is  an  effect  where  salient  objects,  e.g. 
flashing  lights,  moving  objects,  or  high 
contrasts,  cause  an  automatic  shift  of 
attention  towards  this  object  [16].  Attention 
shifts  can  also  be  triggered  by  acoustic  and 
tactile  stimuli,  which  are  not  investigated  in 
this  paper.  In  terms  of  visual  stimuli,  a 
salient  stimulus  means  a discontinuity  in 
space  or  time  in  the  visual  field.  A 
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discontinuity  in  space  represents  a 
difference  in  a static  property,  like  color, 
brightness,  form  or  orientation.  This  could 
be  for  example  a green  dot  in  a set  of  red 
dots.  In  contrast  to  this,  a discontinuity  in 
time  - or  dynamic  discontinuity  - denotes  a 
dynamic  change,  like  abrupt  onset,  flashing 
or  moving  of  an  object.  This  effect  may  be 
restrained  by  the  top-down  process  or  by 
the  saliency  of  other  objects  nearby,  which 
suppress,  with  their  own  high  saliency,  other 
salient  objects. 

Bottom-up  attention  can  trigger  specific 
procedures  on  the  associative  layer,  e g.  in 
case  of  a flashing  emergency  light  the 
attention  of  pilots  should  be  shifted  to  the 
flashing  light  which  is  followed  by  execution 
of  a procedure  to  handle  the  emergency. 

3.0  EXPERIMENTS 

In  order  to  validate  the  visual  performance 
of  the  model,  experiments  have  been 
conducted  with  human  subject  pilots  and 
with  CASCaS  in  a functionally  equivalent 
simulation  environment.  In  the  following 
sections,  we  will  describe  how  the 
experiments  with  the  human  pilots  have 
been  carried  out. 

3.1  Target  System  AHMI 

The  main  objective  of  our  analysis  is  the 
interaction  between  the  pilot  flying  (PF)  and 
the  AFMS.  The  AHMI  is  a graphical  user 
interface  supporting  interaction  between  the 
AFMS  and  pilots.  Both,  the  AFMS  and  the 
AHMI  have  been  developed  by  the  German 
Aerospace  Center  (DLR,  Braunschweig, 
Germany).  The  AHMI  supports  graphical 
information  about  the  current  positions  of 
the  ego-aircraft  and  other  aircrafts,  weather 
conditions  and  flight  routes.  It  provides  a 
horizontal  view  (as  shown  in  Fig.  4)  and  a 
vertical  view.  It  supports  onboard 
management  of  flight  trajectories  and 
negotiation  of  trajectory  changes  with  Air 
Traffic  Control  (ATC)  via  Data  Link  to 
reduce  voice-communication.  The  AHMI  is  a 
powerful  tool  for  pilots,  improving 
predictability  of  conflicts  between  aircraft  or 


between  planned  routes  and  severe 
weather  conditions. 


Fig.  4:  The  AHMI,  a graphical  user  interface 
supporting  interaction  between  AFMS  and 
pilots 

3.2  Flight  Simulator  Setup 

The  experiments  have  been  conducted  in 
the  GECO  (Generic  Experimental  Cockpit) 
simulator,  which  has  been  built  and  is 
maintained  by  the  DLR  in  Braunschweig. 
The  layout  of  the  simulator  has  been 
derived  from  the  Airbus  A350  XWB  aircraft. 

It  is  equipped  with  freely  programmable 
wide-screen  LCD  displays  and  modern  input 
devices  like  side  sticks  and  a Keyboard 
Cursor  Control  Unit  (KCCU),  as  used  in  the 
A380.  The  flight  dynamics  are  derived  from 
a VFW  614  (ATTAS),  as  used  by  the  DLR 
as  a test  aircraft.  The  outside  view  is 
generated  via  three  video  projectors  on  a 
spherical  screen  with  a diameter  of  6 
meters,  providing  highly  realistic  outside 
view.  The  GECO  is  a fixed-based  flight 
simulator  equipped  with  a visual  head 
tracker  (AR-tracking),  and  an  iView-X  eye- 
tracker  system  from  SMI.  Eye-tracker  data 
has  been  matched  on  specific  regions 
representing  areas  of  interest  (AOI)  where 
visual  attention  allocation  should  be 
analyzed.  These  AOIs  were  the  following: 

• Airborne  Human  Machine  Interface 
(AHMI) 

• Primary  Flight  Display  (PFD) 
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• Horizontal  Situation  Indicator  (HSI) 

• Engine  Display  (ENG) 

• Flight  Command  Unit  (FCU) 

• Gears  and  Auto  Break  (GAB) 

• Outside  view  (Windows) 

In  addition,  pilot  voices  and  all  flight 
parameters  have  been  recorded. 

3.3  Scenarios 

In  order  to  analyze  pilot  behavior,  8 
scenarios  have  been  defined,  containing 
different  AHMI-related  tasks.  The  scenario 
that  we  refer  to  in  this  paper  contained  3 
events  that  pilots  had  to  handle.  These 
events  triggered  pilots  to  perform  re- 
planning of  their  current  flight  plan 
according  to  requirements  sent  by  ATC.  A 
flight  plan  is  a list  of  waypoints  the  aircraft 
has  to  fly  over  or  fly  by.  The  scenario  was 
divided  into  three  phases;  cruise,  approach 
and  landing.  Communication  between  pilots 
and  ATC  has  been  restricted  to  non- 
auditory  communication  via  the  AHMI  which 
allowed  uplinks  or  downlinks  of  flight  plans. 

3.4  Participants 

The  experiments  have  been  conducted  with 
13  male  and  2 female  German  line  pilots 
recruited  from  German  airlines.  None  of  the 
pilots  has  been  experienced  in  the  usage  of 
the  AHMI,  and  only  some  have  been  in  the 
GECO  before.  All  subjects  participated  as 
the  pilot  flying  (PF).  The  crew  was 
completed  by  a scripted  pilot,  who  acted  as 
a pilot  monitoring  (PM).  Scripted  PMs  were 
a male  DLR  test  pilot  or  a female  first  officer 
from  Lufthansa.  In  addition  to  the  normal 
duties  of  the  PM,  the  scripted  pilot  was 
responsible  for  the  training  and  supported 
the  debriefing  and  analysis  by  taking  notes 
during  the  flight. 

3.5  Procedure 

The  experiments  were  distributed  over  two 
days.  The  first  day  started  with  a general 
briefing  on  the  project.  Afterwards  training 
on  the  AHMI  and  the  GECO  has  been 


performed  by  the  PM.  After  the  pilots  felt 
familiar  with  the  tasks  and  the  simulator,  a 
talk-through  was  performed,  in  order  to 
verify  that  the  procedures  where  well- 
trained.  After  the  talk-through  was 
performed  successfully,  the  subjects  started 
to  fly  the  first  scenario.  Typically,  2 
scenarios  were  finished  on  the  first  day  and 
5 to  6 scenarios  on  the  second  day. 

4.0  RESULTS 

In  this  section  we  present  results  of 
analyses  regarding  top-down  visual 
performance  of  human  pilots  and  of  our  pilot 
model  in  a cockpit  setup  containing  the 
AHMI.  The  analysis  is  based  (1)  on  eye- 
tracker  data,  which  have  been  recorded 
during  the  experiments  with  human  pilots 
and  (2)  on  log  files  for  the  pilot  model.  The 
output  of  both  data  sources  has  been  pre- 
processed  into  a comparable  format 
containing  timestamps  fj.  ,„and  AOIs  aoii  ,m 
describing  where  pilots  have  looked  at  a 
specific  time.  Each  f,  in  the  datasets  is 
associated  with  exactly  one  ao/}.  The 
experimental  cockpit  has  been  divided  into 
7 AOIs  (see  section  3.2)  in  order  to  analyze 
the  gaze  distribution.  However,  the  results 
presented  in  this  paper  focus  on  4 AOIs 
(AHMI,  PFD,  HSI,  windows)  which  have 
been  selected  after  a first  review  of  the  data 
for  the  following  reasons:  AHMI,  PFD  and 
HSI  are  the  main  displays  for  monitoring 
aircraft  and  environmental  states  in  our 
scenarios  during  all  flight  phases.  The 
windows  are  very  important  for  perception 
of  the  outside  world  during  the  landing.  We 
segmented  the  datasets  according  to  3 flight 
phases  (cruise,  approach,  landing)  and 
calculated  the  percent  dwell  times  (PDT)  for 
each  phase,  respectively.  PDT  is  a format 
representing  the  dwell  time  spent  on  a 
specific  AOI  in  relation  to  the  sum  of  dwell 
time  spent  on  all  AOIs  observed  in  (%).  We 
analyzed  the  PDTs  on  two  levels;  First,  we 
performed  a separate  comparison  of  the 
results  of  each  phase  for  the  human  pilots 
and  for  the  model.  Second,  a comparison  of 
human  data  to  model  data  has  been 
performed. 
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4.1  Human  Performance 

The  gaze  distribution  of  pilots  during  flight 
can  be  seen  as  the  main  indicator  of  how 
important  specific  areas  are  for  flying  an 
aircraft  - from  a pilot’s  point  of  view. 

Huettig,  Anders  and  Tautz  [9]  revealed  the 
dominance  of  the  PFD  in  modern  glass 
cockpits  with  a value  of  around  40%.  For 
the  HSI  a value  of  around  20%  has  been 
measured.  This  result  is  in  line  with  results 
published  by  Mumaw,  Barter  and  Wickens 
(see  [11]  and  [14]),  who  analyzed  the 
monitoring  behavior  of  pilots  on  an 
automated  flight  deck.  They  measured  35% 
on  the  PFD  and  25%  on  the  HSI.  Futher  on, 
eye  movement  analyses  with  a Boeing  747- 
400  desktop  simulator  have  been  conducted 
by  Dietz  etal.  (see  [5]). 

In  order  to  get  an  overview  of  our  results, 
Table  1 depicts  the  average  PDTs  of  our 
human  subjects  for  each  flight  phase. 

Values  do  not  sum  up  to  100  because 
dwells  on  other  AOIs  are  still  taken  into 
account  but  are  not  displayed. 


Cruise 

Approach 

Landing 

AHMI 

60 

42 

21 

PFD 

15 

28 

40 

HSI 

7 

11 

12 

Windows 

6 

7 

17 

Tablet:  Aggregated  PDTs  of  human 

pilots  during  flight  phases  cruise,  approach 
and  landing 


In  contrast  to  results  mentioned  above,  our 
results  reveal  a dominance  of  the  new 
introduced  AHMI  with  a value  of  60%  during 
cruise  phase.  The  PFD,  with  a value  of 
15%,  is  far  behind  the  AHMI.  This 
emphasizes  the  role  of  the  AHMI  in  our 
scenarios.  HSI,  with  a value  of  7%,  is 
behind  the  PFD,  which  is  in  line  with  results 
reported  in  literature.  During  cruise  outside 
view  is  not  important,  thus,  with  a value  of 
6%,  windows  are  behind  the  HSI.  From 
cruise  to  approach  PDT  on  PFD  increases 
by  13%,  while  PDT  on  AHMI  decreases  by 
18%.  HSI  is  also  increasing  by  4%  and 
windows  by  1 %.  From  approach  to  landing 


PDT  on  PFD  is  increasing  by  12%  and  PDT 
on  AHMI  is  decreasing  by  21%.  PDT  on  HSI 
is  increasing  by  1%  and  PDT  on  windows  is 
increasing  by  10%.  Thus,  from  approach  to 
landing  the  rank  orders  of  AHMI  and  PFD 
change  as  well  as  the  orders  of  HSI  and 
windows.  We  assume  that  changes  in  gaze 
distribution  between  different  flight  phases 
are  caused  by  different  task  models  for 
each  flight  phase.  E.g.  the  high  values  on 
windows  during  landing  phase  are  caused 
by  the  upcoming  landing  task  which  triggers 
the  pilot  to  monitor  the  runway.  Low  values 
on  the  AHMI  during  landing  phase  are 
caused  by  degradation  of  the  navigation 
task.  These  changes  are  caused  by  top- 
down  attention  as  described  in  section  2.1. 

4.2  Model  Performance 

Results  of  model  performance  show,  with  a 
value  of  65%,  a strong  dominance  of  the 
AHMI  during  cruise  phase.  With  a value  of 
31%,  the  PFD  is  behind  the  AHMI.  From 
cruise  to  approach  there  is  only  a small 
change  to  66%  on  the  AHMI.  PDT  on  PFD 
does  not  change.  From  approach  to  landing 
rank  orders  of  AHMI  and  PFD  change.  PDT 
on  AHMI  decreases  from  66%  to  35%  and 
PDT  on  PFD  increases  from  31%  to  53%. 
HSI  and  windows  are  at  a very  low  level 
between  0%  and  2%  during  all  phases.  All 
results  are  presented  in  Table  2. 


Cruise 

Approach 

Landing 

AHMI 

65 

66 

35 

PFD 

31 

31 

53 

HSI 

0 

2 

2 

Windows 

1 

1 

1 

Table  2:  Aggregated  PDTs  of  pilot 

model  during  flight  phases  cruise,  approach 
and  landing 


4.3  Model  Validation 

Human  performance  data  has  been  used  to 
validate  the  visual  performance  of  the  pilot 
model  based  on  two  dimensions,  trend  and 
local  fitness,  that  are  often  used  in  the 
domain  of  cognitive  model  validation. 


718 


4.3.1  Measure  of  Trend 

A trend  describes  how  a dependent  variable 
Vd  develops  in  relation  to  an  independent 
variable  Vi.  We  have  measured  the  variable 
gaze  distribution  (=  v^)  in  relation  to  the 
flight  phases  (=  V/)  for  the  human  pilots  and 
for  the  pilot  model.  An  aspect  of  model 
validity  is  trend  consistency,  meaning  that 
the  relation  between  and  v,  is  the  same 
for  the  model  and  for  the  real  world  aspect 
observed.  In  the  area  of  cognitive  model 
validation,  the  use  of  Pearson’s  correlation 
coefficient  (r  and  r^)  is  a common  measure 
of  trend  (see  e g.  \7]  and  [15]).  Having  a 
look  at  the  performance  of  the  pilot  model 
applying  our  scanning  procedure,  it  can  be 
seen  that  it  fits  the  human  visual 
performance  rather  well  with  r^  = 0.85. 

Figure  5 visualizes  trends  based  on  PDTs 
measured  for  the  human  pilots  and  for  the 
pilot  model  during  the  flight  phases  cruise, 
approach  and  landing. 


Cruise  Approach  Landing 
Flight  Phases 


--Human  —Model  BAHMI  BPRO  AHSI  A Windows 

Fig.  5:  Comparison  of  gaze  distribution  for 
human  pilots  and  pilot  model  across  flight 
phases 

AHMI  and  PFD  are  the  most  dominant 
displays  during  all  phases  for  human  pilots 
and  for  the  pilot  model.  AHMI  and  PFD 
change  their  ranks  order  from  approach  to 
landing  phase.  The  human  data  trends  for 
AHMI  and  PFD  between  cruise  phase 
(AHMI  = 65%;  PFD  = 31%)  and  approach 
phase  (AHMI  = 66%;  PFD  = 31%)  have  not 
been  captured  for  the  model.  Indeed,  the 


trend  for  AHMI  between  these  phases  is 
slightly  contrary  to  the  human  findings.  The 
human  data  trend  on  HSI  has  been  well 
captured  for  the  model,  where  PDT  is 
increasing  from  cruise  (=  0%)  to  approach 
(=  2%)  and  then  holding  the  level  from 
approach  to  landing  (=  2%).  The  model’s 
PDTs  for  the  windows  are  linear  for  all  flight 
phases  (=  1%).  We  had  problems  modeling 
this  AOI,  because  dynamic  AOIs,  such  as  a 
runway  “moving”  on  the  windows,  currently 
cannot  be  modeled  within  the  architecture. 
Thus,  we  are  not  able  to  provide  the  model 
with  information  that  is  gathered  by  human 
pilots  when  they  are  looking  out  of  the 
windows.  Nevertheless,  during  our 
experiments  we  implemented  some  kind  of 
“blind  scanning”  on  the  windows  in  order  to 
simulate  transitions  between  windows  and 
displays.  The  intention  was  to  model  the 
effect  of  not  looking  at  displays  (for 
whatever  reason)  which  has  been  identified 
as  a cause  for  long  reaction  times  because 
visual  signals  such  as  flashing  buttons  are 
not  in  the  visual  field  (see  section  2.2).  This 
may  also  impact  pilots’  situation  awareness. 

4.3.2  Measure  of  Location 

We  analyzed  the  local  fitness  of  gaze 
distribution  by  comparing  the  Root  Mean 
Squared  Successive  Differences  (RMSSD) 
values  of  human  pilots  and  the  pilot  model 
as  presented  in  [14].  Local  fitness  measures 
of  model  to  human  data  are  a bit 
problematic  as  trying  to  optimize  local 
parameters  bears  the  danger  of  overfitting 
the  model.  Instead  of  fitting  the  model  to  a 
static  parameter  value,  it  is  more 
reasonable  to  fit  the  model  into  a range  of 
parameter  values.  RMSSD  can  be  used  to 
gain  insight  into  the  differences  of 
performance  between  an  individual  subject 
S/  and  a group  gj  of  individual  subjects  Si 
We  calculated  RMSSD  for  each  of  the 
human  subjects,  pulling  them  one  at  a time, 
without  replacement,  from  the  group.  In  our 
case  the  group  contained  10  subject 
datasets  and  we  tested  the  fit  of  Si  to  S2...10, 
then  data  from  S2  to  s-,,  S3...,o  and  so  on.  The 
results  of  these  measures  were  10  values, 
one  for  each  pilot,  describing  the  deviation 
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from  the  performance  of  the  group.  This 
approach  has  been  extensively  described  in 
[8],  We  also  calculated  RMSSD  for  our  pilot 
model  by  comparing  the  model  dataset  to 
the  group  of  all  human  subjects  Si„jo-  Next, 
we  will  focus  on  results  regarding  the  cruise 
phase  as  this  is  the  most  important  phase 
for  pilot  Interaction  with  the  AHMI.  Results 
are  depicted  in  Fig.  6.  RMSSD  values  for 
human  subjects  range  from  5.52  for  subject 
PF_03  to  24.1 1 for  subject  PF_09.  The 
RMSSD  for  the  pilot  model  is  19.21  which  is 
within  the  range  of  human  subject  values. 
However,  a comparison  of  this  value  to  the 
median  of  human  pilots’  RMSSDs  (=  7,63) 
shows  that  the  model  result  is  closer  to  the 
maximum  than  to  median. 

PF_01 

PF_02 

PF_03 

PF_04 

« PF  05 
tJ 

PF_06 
5 PF_07 
PF_08 
PF_09 
PF_10 
Model 

Fig.  6:  Comparison  of  RMSSD  values  for 
PDTs  of  human  pilots’  and  pilot  model's  gaze 
distribution  in  cruise  phase 

Except  for  subjects  PF_02  and  PF_09  all 
pilots  are  below  a value  of  12.0  which 
shows  that  these  results  are  outliers  in  the 
sample.  Analysis  of  outlier  datasets  showed 
that  the  deviations  are  caused  by 
differences  in  PDTs  on  the  AHMI.  We  have 
measured  60%  mean  PDT  and  61%  median 
PDT  on  the  AHMI  which  is  a hint  on  a well- 
balanced  distribution.  For  PF_02  we  have 
measured  80%  PDT  (=  max)  and  for  PF_09 
we  have  measured  39%  PDT  {=  min).  PDTs 
of  PF_02  on  other  AO  Is  were  much  lower, 
those  of  PF_09  much  higher  respectively. 

An  explanation  could  be  that  PF_02  used 
redundant  information  shown  on  the  AHMI 


(such  as  speed,  altitude)  for  monitoring. 
Thus,  he  has  implemented  the  AHMI  in  his 
scanning  procedure  (top-down  attention). 

On  the  other  hand,  PF_09  used  the  AHMI 
only  if  he  had  to  react  to  ATC  uplinks 
(bottom- up  attention)  instead  of  including 
the  AHMI  into  his  scanning  procedure. 

5.0  DISCUSSION 

Analysis  of  visual  attention  is  a useful 
means  for  assessment  of  situation 
awareness  and  derivation  of  task  models  for 
scanning  activities  in  cockpit  environments. 
We  have  modeled  scanning  procedures  for 
an  advanced  cockpit  environment  and 
performed  experiments  with  a pilot  model 
applying  this  procedure  and  with  human 
subject  pilots.  We  used  the  visual 
performance  data  recorded  for  the  human 
pilots  and  for  the  pilot  model  to  validate  the 
visual  performance  of  the  model.  While 
Pearson’s  rand  are  useful  trend 
measures,  RMSSD  can  be  used  to  measure 
the  local  match  between  model  and  human 
data.  Good  results  for  Pearson’s  r and  are 
not  sufficient  to  validate  a model.  A valid 
model  must  also  perform  within  the  natural 
range  measured  for  the  variable  under 
observation  of  the  human  subjects. 
Comparing  our  result  for  the  trend  measure 
between  human  pilots’  and  model  gaze 
distribution  with  the  results  of  local  fitness, 
we  derive  the  following:  As  the  trend 
measure  between  model  and  human 
performance  revealed  good  fitness,  we 
assume  that  we  have  a rather  good 
assumption  of  how  important  specific  AOIs 
are  for  the  pilots  relatively  to  the  flight 
phases.  As  the  gaze  distribution  is  a good 
indicator  for  the  correctness  of  the  scanning 
tasks  in  the  different  flight  phases,  we  also 
assume  that  we  have  a correct 
understanding  of  the  importance  of  specific 
scanning  tasks  performed  in  these  flight 
phases.  However,  RMSSD  revealed  that  the 
performance  of  the  model  is  at  the  upper 
bound  of  human  subjects’  performance. 

This  can  be  improved  by  decreasing  gaze 
on  AHMI  and  PFD,  and  increasing  gaze  at 
least  for  the  HSI,  which  has  not  been 
modeled  sufficiently.  Gaze  on  the  windows 
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has  not  been  modeled  adequately.  It  has  to 
be  discussed  if  it  is  reasonable  to  put 
attention  on  an  area,  whose  functionality 
cannot  be  simulated,  only  to  provoke  effects 
related  to  bottom-up  attention.  Alternatively, 
only  flight  in  cruise  phase  could  be 
modeled,  which  has  shown  to  be  the  most 
relevant  flight  phase  for  AHMI  interaction. 

6.0  CONCLUSIONS 

In  this  paper  we  have  presented  results 
concerning  the  visual  attention  allocation  of 
human  pilots  and  of  a pilot  model  in  an 
advanced  cockpit  environment.  We  have 
been  able  to  show  that  the  AHMI,  a new 
interface  for  aircraft  navigation,  has  a strong 
influence  on  the  gaze  distribution  of  pilots 
due  to  task  models  underlying  the  flight 
phases.  Tasks  (especially  scanning 
activities)  have  been  modeled  in  a rule 
based  language.  These  rules  have  been 
applied  by  our  pilot  model  as  procedural 
knowledge  during  the  flight  phases. 
Analyses  of  human  pilot  performance  and 
model  performance  in  the  dimensions  of 
trend  and  local  fitness  revealed  that  there  is 
still  some  potential  left  for  improving  the 
scanning  behavior  of  the  model.  An  open 
question  is  if  it  is  useful  to  model  "blind 
scanning”  on  AOIs  whose  functionality 
cannot  be  simulated  in  order  to  provoke 
effects  related  to  bottom-up  attention. 
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Motivation 


Support  evaiuation  of  system  designs  in  early  phases  of  system  deveiopment  process 
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The  Role  of  Visual  Perception 
for  Modeling  Pilot  Behavior 
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(Rules) 


The  Role  of  Visual  Perception 
for  Modeling  Pilot  Behavior 


1 . Evaluation  of  situation  depends  on 
parameter  values 

2.  Parameters  have  to  be  percepted  from 
the  environment 

3.  Pilots  percept  environment  mainly  via 
visual  channel 

4.  No  valid  model  of  visual  perception  -> 
No  valid  model  of  pilot  behavior 
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Visual  Attention:  Top-Down 


• Active  scanning  behavior 

• Depends  on  context  of  situation 

• Abstraction  of  SEEV  Model  (Saliency,  Expectancy,  Effort,  Value) 

• Probability  value  for  each  AOI 


Cruise  I Landing 
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Visual  Attention:  Top-Down 


• ActiVS  /rr. 


Cruise 


Cruise 
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"“"““Improved  Top-Down  Attention 


• Pilots  tend  to  optimize  scanning  behavior 

• Probabilities  on  transitions  between  AOIs 

• Different  probability  values  for  each  transition 


p=0.8  p=0.5 


p=02 


p=0,5 


Cruise 
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Visual  Attention:  Bottom-Up 


• Reactive  scanning  behavior 

• Depends  on  saliency  of  objects  in  visual  field 

• SEEV  Mode!  Saliency 


GOAL(hand!e_emergencY) 
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Visual  Attention  Model 
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• Consists  of 

- Top-down  attention  (active) 

- Bottom-up  attention  (reactive) 

• Visual  attention  is  nnainly  influenced  by  top-down 
attention 

- Considers  context  of  different  situations 

- Supports  modeling  of  human  optimization  strategies 
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Experimental  Setup 

• Scenario  duration:  "-35  minutes 

• 15  Airline  pilots 

- 13  male,  2 female 

- Average  Age:  34.0  (SD  5.9) 

• Events  triggering  re-planning  on  AHMI 

• Three  flight  phases: 
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Target  System;  AHMI 


• Airborne  Human  Machine  Interface 

• Data  link  communication  between  Crew  and  ATC 

• Negotiation  of  4D  flight  plans  and  trajectories 

• View  on  ego-aircraft 

- Horizontal 

- Vertical 
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Simulator  Layout  and  AOIs 
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Percent  Dwell  Time 


Results 
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• Focus  of  pilots’  attention  is  mainly  on  AHMI 

* AHMI  has  no  influence  on  rank  order  of  standard 
displays 


Comparison  of  mean  PDTs  of  human  pilots  to  literature 
(aggregated  for  all  flight  phases) 


Tautzet.  Al.  MumawetAl,  Dietz  et.  Al.  Ourexperiments 


■ AHMI 
m PFD 

■ HSI/ND 

■ MCP/FCU 

■ CDU 

■ Windows 


MODS  EM  WORLD 
Conrofonc*  & &po 


Measure  of  Trend 


Cruise  Approach  Landing 

Flight  Phases 


— Human  — Model  BAHMI  •PFD  A HSI  ♦ Windows 
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Measure  of  Location:  RMSSD* 


1.  Calculate  individual 
difference  of  each 
human  pilot  to  the  group 

2.  Calculate  difference  of 
virtual  pilot  to  the  group 


3.  See  if  model  is  in  range 
of  human  performance 


*in  cruise  phase 
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Questions/Comments 


Thank  you  for  listening! 
Any  questions? 
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Abstract:  Being  lost'’  is  an  exemplar  of  imperfect  Situation  Awareness/Situation  Understanding  (SA/SU)- 
information/knowledge  that  is  uncertain,  incomplete,  and/or  just  wrong.  Being  losf  may  be  a geo-spatial  condition  - not 
knowing/being  wrong  about  where  to  go  or  how  to  get  there. More  broadly,  being  'losf  can  serve  as  a metaphor  for 
uncertainty  and/or  inaccuracy  - not  knowing/being  wrong  about  how  one  fits  into  a larger  world  view,  what  one  wants  to 
do,  or  howto  do  it. 

This  paper  discusses  using  agent  based  modeling  (ABM)  to  explore  imperfect  SA/SU,  simulating  geo-spatially  'losf 
intelligent  agents  trying  to  navigate  in  a virtual  world.  Each  agent  has  a unique  “mental  map'’  - its  idiosyncratic  view  of  its 
geo-spatial  environment.  Its  decisions  are  based  on  this  idiosyncratic  view,  but  behavior  outcomes  are  based  on  ground 
truth.  Consequently,  the  rate  and  degree  to  which  an  agent’s  expectations  diverge  from  ground  truth  provide  measures  of 
that  agenfs  SA/SU. 


1.0  INTRODUCTION 

A current  emphasis  in  the  development 
of  information  systems  technologies  is 
improving  situation  av\/areness/situation 
understanding  (SA/SU)’  for  military  and 
civilian  applications.  Such  improvement 
requires  understanding  what  is,  or  may 
be,  wrong  with  current  capabilities. 

Modeling  and  simulation  (M&S)  can 
play  a significant  role  in  exploring 
problems  with  current  capabilities,  as 
well  as  in  assessing  the  efficacy  of  new 
or  proposed  informationtechnologies 
and  determining  how  best  to  employ 
them. Unfortunately  current  M&S  tools 
face  major  limitations  with  respect  tothe 
representation  of  imperfect  SA/SU. 

These  tools  do  a reasonable  job  in 
representing  incomplete  SA/SU, 
supporting  decision-making  and  risk 
assessment  with  respect  to  missing 


* Rallier  lliuii  engage  in  a diseiiNKiun  as  Lo  I he  difTerences 
between  SA  and  Sl.l,  I choose  to  blur  tliein  together  to  a 
single  over-arehing  eoneepL  following  the  pragniatie 
deliiiitinn  oi"[Ada]n  1993]  ""knowing  u hat  is  going  on  so  I 
can  figure  out  what  to  do.  lor  more  on  SA/SU  the  reader  is 
direeted  ttJ  [Middleton  2010]  and  the  referenees  therein 


data.  They  fall  short  in  their  ability 
toinvestigate  and  assess  the 
consequences  ofincorrect  and 
inconsistent  SA/SU,  which  requires 
exploring  how  to  recognize  and  correct 
SA/SU  based  on  information  that  is  just 
plain  wrong. 

While  the  focus  on  incomplete  SA/SU 
probably  reflects  the  current  emphasis 
on  providing  more  information  to  war 
fighters  through  improved  information 
technologies,  it  discounts  equally 
pertinent  issues  with  respect  to  the 
capabilities  and  fallibilities  of  the  human 
operator.  Although  it  is  hard  to  argue 
against  giving  decision  makers  more 
data,  it  is  true  that  humans  can  (and 
frequently  do)function  well  with 
information  that  is  incomplete  or 
imprecise.  Incorrect  or  flawed 
informationmaybe  even  more 
problematic  for  SA/SU  and  associated 
decisions  than  missing  data.  For  one 
thing,  plans  based  on  known  data  gaps 
and  uncertainties  are  generally  more 
robust  to  account  for  unknown  factors. 
Plans  based  on  wrong  information  may 
rely  too  heavily  on  fallacious 
assumptions  to  optimize  outcomes,  with 
potentially  catastrophic  results.  In 
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addition,  an  incorrect  understanding  of 
an  operational  situation  may  bias 
subsequent  information  processing,  and 
lead  to  flawed  decision-making  based 
on  persistent  problems  with  SA/SU. 

1.1  Objective  and  Approach 

This  paper  examines  the  nature  of 
imperfect  SA/SU,  how  individual 
decision-makers  might  recognize 
problemsin  their  SA/SU,  how  they  might 
seek  to  correct  those  problems,and/or 
strategies  they  might  employ  to  mitigate 
the  negative  effects  of  imperfect  S/VSU. 
The  paper  is  based  on  an  easily 
appreciated  exemplar  of  imperfect 
S/VSU,  the  concept  of  being  “losf . The 
“being  lost  “ exemplar  is  attractive  for  a 
number  of  reasons; 

• First,  in  both  civilian  life  and  military 
operations  being  “lost”  is  a metaphor 
for  uncertainty  as  to  how  one  fits  into  a 
larger  context  or  world  view,  not 
knowing  exactly  what  to  do,  or  worse, 
not  knowing  where  one  wants  to  go 
and  what  one  wants  to  accomplish. 

• Second,  the  phenomena  of  being  lost 
in  the  non-metaphorical  sense,  i.e., 
geo-spatially  “lost”,  provides  context 
for  decision-making  in  which  imperfect 
S/VSUcan  be  expressed  in  terms 
ofconcrete,  measurablecharacteristics 
of  the  environment,  describing  natural 
and  man-made  geographical  features 
and  expressed  in  mathematically 
rigorous  geometrical  and  topological 
relationships. 

• Third,  since  many  M&S  tools  already 
incorporate  extensive  .techn  olog  ica  lly 
mature,  representations  of  terrain  and 
geo-spatial  relationships, modeling  the 
phenomena  of  being  geo-spatially  “lost” 
provides  an  accessible  and  easily 
understandable  test  bed  for  exploration 
of  imperfect  SA/SU. 

• Finally,  there  is  a well-documented 
body  of  research  dealing  with  human 
way  finding,  route  planning,  and 
navigation,  all  of  whicharecharacteristic 
of  general  human  abilities  to  function 
with  imperfect  SA/SU.  See  for 
example:[Rabul2001];  [Timpf  2002]; 


[Timpf&  Kuhn  2002];  [Richter  &Klippel 
2005];  [Klippel&  Winter  2005];  [Reece, 
Kraus  &Dumanoir  2000] 

1.2  Background:  Agent  Based 
Modeling 

This  paper  proposes  the  use  of  agent 
based  modeling  (ABM:  see  for  example; 
[llachinski  1997,2004]; 
[Borshchev&Filippov  2004];[Macal& 
North  2005];  [Middleton  2008];  [Easton 
& Barlow  2002])  to  simulate  being  geo- 
spatially  “lost”  in  a simulation  world. 

In  ABM,  agents  (simulated  entities)make 
decisions  according  to  their  own 
individual  (and  probably  imperfect) 
SA/SU.  Each  entity  will  have  a 
“perceived  truth”  knowledge  base  - an 
idiosyncratic  view  of  the  operational 
situation,  as  seen  by  that  individual  and 
obscured  by  the  agent’s  local  “fog  of 
war”. 

This  paper  agues  that  monitoring  the 
divergence  between  this  idiosyncratic 
viewand  simulation  “ground  truth”  can 
provide  a measure,  in  quantitative 
terms,  of  the  degree  to  which  each 
agent’s  SA/SU  may  be  imperfect.  Such 
a measure,  based  on  allowing  each 
agent  to  act  on  an  imperfect  worldview, 
supports  evaluation  of  the  operational 
costs  of  uncertain,  incomplete  and/or 
incorrect  information.  It  also  supports 
explicit  modeling  of  leader  decision- 
making processes  based  on  such  data, 
of  imperfect  command  and  control, 
and/or  imperfect  subordinate  receipt  of 
and  subsequent  execution  of  orders. 
This  kind  of  modeling  is  critical  if  we  are 
to  estimate  the  benefits  of  proposed 
new  or  modified  systems,  and/or 
adjustments  to  tactics,  techniques  and 
procedures. 

1.3  Terrain  Representation 
and  Movement 

The  SA/SU  measures  discussed  above 
are  dependent  on  both  the  way  in  which 
a simulation  represents  terrain  and 
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movement  over  that  terrain.  Movement 
generally  has  several,  possibly 
overlapping  components,  which  will  be 
referred  to  herein  as; 

• way  finding  - the  process  of  learning 
one’s  environment  to  avoid  obstacles 
and  find  features  and  points  of  interest 
(as  is  typical  of  robot  “navigation"), 
which  may  aiso  incorporate  foilowing  a 
genera i search  pattern  or  algorithm 
untii  one’s  objective  is  reached; 

• route  planning  - the  use  of  aigorithms 
and  heuristics  to  plot  a path  and/or 
define  a list  of  instructions  describing 
how  to  get  from  one  point  to  another, 

• route  following  - the  process  of  actuaily 
moving  aiong  that  path  or  in 
accordance  with  those  instructions. 

Current  models  describe  terrain  (see  for 
example;[Reece  2003];  [Heib  et  al. 
2006];  [Donlon&Forbus  1999]  [Glinton  et 
al.  2004])  in  either  metrical/Euclidean  or 
topological  terms,  or  in  some 
combination  of  both.  Euclidean 
schemes  focus  on  straight-line 
distances  between  features  of  interest, 
while  topological  schemes  describe 
spatial  relationships  (e.g.,  adjacency, 
connectivity,  and  containment) between 
such  features.  In  both  cases  terrain  is 
often  overlaid  with  covering  polygons, 
which  can  be  regular  tessellating 
polygonal  tiles  (triangles,  squares  or 
hexagons),  or  irregular  polygon  covering 
schemes  such  as  Voronoi  diagrams. 

In  strictly  Euclidean  schemes  node-to- 
node  “distance”  metrics  are  based  on 
regular  grid  coordinates,  while  more 
generic  topological  approaches  can 
reflect  a myriad  of  relational  factors, 
such  as  trafficability,  the  availability  of 
cover/concealment,  and/or  influence 
ambits  based  on  the  proximity  of  geo- 
political configurations,  static  and/or 
dynamic  adversary  threats,  and  the  like. 

One  of  the  most  popular  approaches  to 
route  finding  uses  arc-node  graphs  and 
shortest  path  algorithms,  e.g..  A*  or 
Dijkstra’s  algorithm.  Nodes  specify 


waypoints  along  a path,  with  arcs 
describing  the  connections  between 
these  nodes.  In  the  case  of  Euclidean 
tessellation  approaches,  nodes  typically 
coincide  with  the  polygons  or  tiles 
covering  the  space,  with  arcs  for  each 
shared  boundary  line.  In  a strict 
topological  view  the  nodes  are  only 
defined  for  points  of  interest,  with  the 
arcs  representing  possible  connections. 
In  either  case,  arc  costs  from  one  node 
to  another  can  reflect  any  and  all  of  the 
“distance”  metrics  described  above,  and 
can  be  used  In  “shortest”  path 
algorithms  to  determine  the  optimal  path 
through  the  arc-node  structure. 

Under  any  of  these  schemes,  the  key 
questions  become  first,  what  does  it 
mean  to  be  lost,  second  how  does  an 
agent  find  itself  in  such  a state  or  states, 
and  finally  can  the  agent  recognize  the 
problem  (i.e.,  “know”  it  has  bad  spatial 
SA/SU)  and  correct  it? 

2.0METHODOLOGY 

In  truth,  of  course,  in  virtually  any  real 
world  operational  situation,  the  SA/SU  of 
any  individual  or  organization  involved  in 
that  operation  is  going  to  be  less  than 
perfect,  with  imperfections  that  range 
from  negligible  to  catastrophic. 
Fortunately,  as  mentioned  above, 
human  decision-making  and  course  of 
action  (COA)  selection  tend  to  be  robust 
with  respect  to  even  many  significant 
imperfections,  and,  in  fact,  “good” 
decision-making  considers  such 
imperfections  explicitly.  For  example, 
military  plans  strive  to  make  provision 
for  inadequate/poor  intelligence  and 
associated  unexpected  events;  “no  plan 
survives  first  contact  with  the  enemy”. 

Humans  can  find  their  way  from  one 
point  to  another  with  very  rudimentary 
and/or  inaccurate  maps.  They  can 
frequently  function  satisfactorily  with 
ambiguous  and  unclear  directions.  Of 
course,  in  such  cases  some  degree  of 
vituperation  may  be  directed  at  the 
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providers  of  these  “direction”  aids;  a 
reaction  that  itself  further  speaks  to  the 
nature  of  decision-making  under  stress 
and  uncertain  information.  An  effective 
simulation  of  getting  or  being  “lost” 
should  incorporate  both  this  human 
resilience  and  the  effects  of  such 
stresses  as  uncertainty  and  time 
pressure,  as  important  parts  of  the  costs 
and  effects  of  imperfect  SA/SU. 

In  addition  to  incorporation  of  resilience 
and  the  effects  of  stress  on  decision- 
making, other  requirements  for  an 
effective  simulation  of  being  lost  include; 

• The  capability  for  an  entity’s  view  of 
where  it  is  and  where  it  is  going  to  be 
different  from  ground  truth. 

• An  error  taxonomy  that  reflects  both 
types  of  being  lost  and  degrees  of 
“lostness”; 

• The  mechanisms  by  which  an 
individual  achieves  different  states  of 
being  lost;  and 

• The  mechanisms  by  which  an 
individual  recognizes  and  attempts  to 
correct  being  lost. 

2.1  Mental  Maps  and  Ground 
Truth 

The  approach  taken  herein  to  meet 
these  requirements  begins  with 
assuming  a specific  formulation  for  each 
simulated  entity’s  idiosyncratic  view  of 
the  world,  a “mental  map”  that 
represents  its  own  particular,  probably 
distorted,  view  of  ground  truth 
geography. 

The  mental  maps  proposed  herein  are 
based  on  an  arc/node  graph 
representation.  Such  a structure  be 
generally  accommodated  by  any  of  the 
terrain  representations  discussed 
above,  and  can  be  used  for  both  route 
planning  and  route  following. 

Each  node  in  the  graph  will  have  one  or 
more  generic  “color”  attributes, 
characteristics  that  describe  features  of 
the  node  that  may  be  recognizable  by 


an  agent.  Such  attributes  might  reflect 
terrain  trafficability,  population  density, 
type  of  buildings  or  other  structures,  and 
so  forth.  Color  attributes  can  also  be 
used  to  suggest  regional  affiliations  for 
nodes,  for  example  geo-political 
associations,  threat  areas,  broad 
geographical  relationships  and  the  like. 

Each  node  will  also  have  a set  of  node 
neighbors  listing  the  color  attributes  of 
those  nodes  with  the  additional 
information  that  defines  relationships  of 
the  parent  node  to  its  neighbors, 
principally  direction  and  distance. 

The  ground  truth  descriptions  of  node 
attributes  will  be  numerical  or  crisp  set 
attributes,  the  mental  map  descriptions 
will  be  generally  be  fuzzy  set 
membership  attributes.  For  example, 
ground  truth  population  density  will  be  in 
people  per  square  mile,  the  mental  map 
representation  may  be  some  degree  of 
urban,  suburban,  rural.  Ground  truth 
distance  will  be  in  meters  or  kilometers, 
mental  map  distances  will  be  close,  not 
to  far,  remote.  Ground  truth  directions 
will  be  in  degree  from  true  north,  mental 
map  directions  will  be  north,  north  east, 
east,  and  so  forth. 

The  mental  maps  of  each  agent  in  the 
simulation  will  allow  those  entity’s  to 
misrepresent  ground  truth  at  both  the 
perceptual  level  (failing  to  correctly 
observe  ground  truth  data)  and  the 
cognitive  level  (failing  to  understand  or 
discern  ground  truth  from  the  data 
available  to  it).  The  use  of  fuzzy  set 
relationships  in  the  mental  map, 
however,  allows  the  entity  to  make 
decisions  based  on  fuzzy  inference 
rules,  i.e.,  using  a best  guess  or  best  fit 
approximation  between  an  uncertain 
mental  map  and  a crisp  ground  truth. 

The  fundamental  decision  component 
for  each  entity  is  the  “next  node" 
selection  operation.  An  agent  plans  its 
movement  from  its  mental  map,  but 
actual  movement  takes  place  in  ground 
truth  terrain. 
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Each  entity  will  have  a route  plan  based 
on  the  mental  map  arc/node  graph  and 
actual  entity  movement  will  be  to  the 
ground  truth  node  that  best  corresponds 
to  the  “next  node”  in  that  route  plan.  In 
the  case  of  multiple  candidates  for  the 
ground  truth  “next  node”  a Monte  Carlo 
selection  will  be  made  based  on  the 
degrees  of  fuzzy  correspondence  to 
ground  truth  exhibited  by  the  mental 
map  nodes. 

In  addition,  the  mental  maps  may  have 
incorrect  data,  arcs  between  nodes  that 
are  not  actually  connected,  and  vice 
versa  missing  arcs  between  nodes  that 
actually  are.  or  similarly  have  nodes  that 
do  not  actually  exist  or  fail  to  have 
nodes  that  do. 

Finally,  mental  maps  will  be  dynamic, 
with  data  being  continually  filled  in, 
confirmed,  or  refuted  by  observation, 
while  ground  truth  values  are  typically 
static,  unless  they  pertain  to 
presence/absence  of  entities. 

2.2  Being  “Lost” 

A taxonomy  of  being  “losf  then  begins 
with  one  or  more  of  the  following 
general  conditions; 

• having  a mental  map  that  coincides 
with  ground  truth,  but  with  a different 
registration  point  - the  agent  thinks  its 
current  node  is  different  from  its  real 
ground  truth  node\ 

• having  a mental  map  that  corresponds 
with  ground  truth,  but  with  an  uncertain 
or  unknown  registration  point  - the 
agent  is  unsure  or  doesn’t  know  what 
its  current  node  /s; 

• having  a mental  map  that  fails  to 
correctly  correspond  to  ground  truth 
witha  bad  arc/node  network 
connections  - the  agent  thinks  roads  or 
paths  lead  to  places  they  don't;  a nd/or 

• having  a mental  map  that  fails  to 
correctly  correspond  to  ground  truth 
with  incorrect  characterizations  of 
nodes  and  arcs  - objects  the  agent 
isinterested  in  are  in  incorrect  positions 
with  misleading  fuzzy  set  attributes 


and/or  deceptive  directional/distance 
relationships  as  represented  on  the 
agent’s  mental  map. 

Clearly  the  “mental  map”  can  be  a 
complex  data  structure,  incorporating  for 
example, a hierarchal  structure  with 
different  levels  of  terrain  representation 
based  on  scale  of  movement. [Richter 
and  Klippel  2005] , for  example,  discuss 
the  concept  of  routes  “as  a sequence  of 
decision  point  / action  pairs”;  which  may 
be  combined  through  spatial  chunking, 
grouping  several  decision  point  / action 
pairs  into  a single  route  segment,  which 
they  refer  to  ashigher  order  route 
direction  elements  (HORDE). 

2.3  Getting  “Lost” 

At  its  core  a simulated  agent’s  mental 
map  must  address  the  fundamental 
question  at  each  stage  in  an  agent’s 
movement:  “where  to  go  next?”  The 
mental  map  needs  to  answer  this 
question  at  the  level  of  resolution 
appropriate  to  the  simulation,  which, 
without  loss  of  generalization,  will  be 
taken  to  be  the  “next  node”  selection 
whether  that  node  represents  a “nearest 
neighbor”  point  on  a grid,  or  the  degree 
of  advancement  along  a specific  route 
segment  or  path. 

In  such  a simulation,  how  does  an  entity 
“get  lost"? 

• by  suffering  from  incorrect  initial 
registration,  i.e.,actually  starting 
movement  at  node  or  grid  coordinates 
in  ground  truth  network  that  do  not 
correctly  correspond  with  the  mental 
map; 

• by  first  order  “next  node"  decision  point 
errors  - failing  to  correctly  choose  the 
correct  ground  truth  “next  node”  in  a 
route  plan  based  on  ambiguities  in  the 
mental  map;i.e.,  misinterpreting  the 
ground  tmth  features  that  correspond 
to  that  map.as  for  example  in  failing  to 
recognize  the  correct  intersection  to 
make  a turn  and/or  making  a turn  at  an 
incorrect  intersection; 
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• by  second  order  “next  node"  decision 
point  errors  - failing  to  recognize  errors 
in  theroute  plan  itself  based  on 
fundamentalmental  map  errors  - trying 
to  move  along  mental  map  networks 
that  have  extra/missing  nodes,  and/or 
bad  arcs. 

2.4  Recognition  of  Being 
“Lost” 

Separate  from  being  lost  is  recognizing 
that  condition;  an  entitycan  be  totally 
\wrong  about  where  it  is  and/or  where  it’s 
going,  but  until  it  recognizes  that  fact  it 
will  continue  act  in  accordance  with 
what  it  believes  its  mental  map  to  be. 

An  entity  can  recognizeit  is  “lost”  in 
several  ways,  each  of  which 
alsocorrespondsmore  broadly  tothe 
recognition  of  poor  SA/SU  in  non  geo- 
spatial domains: 

• insufficient  mental  map  data  - the 
entity’s  mental  map  (or  more  broadly  its 
SA/SU  of  the  current  operational 
situation)  does  not  provide  enough 
information  to  make  a 
reasonedjudgment  as  to  the  correct 
nextmove  or  other  action,  literally  not 
knowing  which  way  to  turn.  Having 
absolutely  no  data  is  relatively  rare,  but 
having  missing  and/or  uncertain  data  is 
fairly  commonplace.  In  such  cases  the 
“next  node"  selection  would  be 
basically  a random  draw  from  available 
ground  truth  nodes; 

• cumulative  mental  map  discrepancies  - 
the  entity’sgeneral  accumulation  of 
evidence  throughout  several  “next 
node"  selections,  i.e.,  as  the  entity 
moves  it  observes  critical  differences 
between  ground  truth  and 
environmental  features  expected  in 
accordance  with  the  mental  map,  to  the 
point  where  the  entity  lack  of  belief  in 
its  mental  map  renders  it  as  above 
without  sufficient  information  as  to  the 
correct  next  move 

• an  abrupt  discontinuity  between  the 
mental  map  and  ground  truth,  for 
example  running  into  a ground  truth 
dead  end. 


Comparing  expectations  to  actual 
observed  ground  truth  phenomena  can 
be  likened  in  some  ways  to  the  use  of  a 
“dead  reckoning”  function  in  navigation. 
Given  the  uncertainty  and  possible 
inaccuracies  of  an  agent’s  mental  map, 
the  agent  needs  some  way  to  determine 
its  degree  of  being  “lost”,  which  will  be 
defined  by  thresholds  for  increasingly 
aggressive  measures  to  correct  mental 
maps  and  plans  of  movement. 

2.5  Correcting  for  Being 
“Lost” 

Given  that  an  entity  does  recognize  it  is 
lost,  what  measures  can  it  take  to 
correct  this  situation?The  answer  is 
dependent  on  the  way  in  which  the 
entity  become  lost  and  what  kind  of  “lost 
“ it  perceives  itself  to  be: 

• if  the  entity  suffers  from  insufficient 
mental  map  data,  it  can  either  attempt 
to  gain  more  data,  to  “scout  out  the 
environment"  through  exploration,  or  it 
can  pick  a robust  localway 
findingsfrategy.  For  example,  if  lost  in 
a city  one  can  frequently  head  in  a 
fixed  direction  with  some  confidence  of 
eventually  striking  some  linear  feature 
or  boundary  landmark  that  will  allow 
reorientation  or  re-registration  of  one’s 
menial  map; 

• as  long  as  the  entity  appears  to  be 
making  reasonable  progress  towards 
its  objective,  it  can  adapt  its  mental 
map  to  remove  incongruities  between 
that  map  and  observed  ground  truth. 
Such  incongruities  are  likely  to  be 
metrical  in  nature,  such  as  inter-node 
direction  and  distance  values  that  may 
be  somewhat  off  kilter; 

• on  the  other  hand,  the  perception  of 
topological  errors,  such  as  missing 
and/or  extra  n odes/a rcswii I probably 
result  in  the  need  to  make  fundamental 
changes  in  the  mental  map.  requiring 
the  acquisition  of  additional  ground 
truth  data  through  exploration,  or  the 
provision  of  intelligence  from  sources 
external  to  the  entity  in  question. 

• on  recognition  of  accumulated  route 
following  errors  ora  discontinuity,  the 
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entity  may  choose  the  option  of 
retracing  the  route  to  some  earlier  “next 
node”  decision  point  where  there  may 
have  been  a significant  possibility  of 
error,  as  for  example  when  the  choice 
between  two  ground  truth  “next  nodes” 
was  in  someway  difficult,  either 
because  of  not  enough  information  or 
because  of  a choice  between  two  or 
more  very  similar  nodes. 

3.0  DISCUSSION 

Of  course,  the  methodology  proposed  in 
section  2 is  only  useful  if  one  can 
demonstrate  a correspondence  between 
the  actions  of  simulated  entities  and  real 
world  behaviors,  and  more  importantly, 
if  the  simulation  can  provide  insight  into 
those  behaviors  that  supports 
improvement  in  SA/SU  for  real  world 
operations. 

Such  demonstration  begins  with  the 
conduct  of  simulation  experiments  that 
explore: 

• relating  simulation  outcomes  to  the 
quality  of  SA/SU  asmeasured  by 
mental  map/ground  truth  incongruities  - 
by  the  divergence  between  the 
expected  result  of  the  entity’s  actions 
and  the  observed  results; 

• appropriate  incongruity  threshold 
values  for  different  degrees  of 
corrective  actions; 

• possible  “dead  reckoning"  functions  to 
support  movement  towards  an 
objective  in  the  face  of  imperfect 
S/VSU; 

• the  use  of  landmarks  - unmistakable 
ground  truth  features,  to  solve 
registration  problems  with  mental 
maps; 

• incorporation  of  mental  map 
uncertainty  or  belief  values  in  the 
calculation  of  route  “distance”,  thus 
allowing  consideration  of  route 
robustness  with  respect  to  risk  of  errors 
in  a mental  map;  and 

• the  use  of  various  information 
technologies  to  support  mental  map 
corrections  and  updates. 


4.0  SUMMARY  AND 
CONCLUSIONS 

The  goal  of  this  paper  is  not  to  develop 
a new  theoretical  understanding  of 
SA/SU  and  decision-making.  Rather  itis 
to  propose  an  engineering  solution  to 
the  practical  problems  faced  by 
decision-makers  who  must  devise 
information  system  requirements  and 
evaluate  the  technological  approaches 
that  may  be  proposed  to  meet  those 
requirements. 

The  bottom  line  for  that  solution  is 
simulation  of  the  actions  of  an  entity 
taken  in  accordance  with  that  agent’s 
unique  SA/SU  and  in  expectation  of 
fulfilling  one  or  more  goals.  By 
implementing  an  appropriateset  of  data 
structures  and  inference  procedures,  an 
entity  should  be  able  to  compare 
expectations  to  observable  aspects  of 
the  environment.  Entity  behaviors  are 
then  seen  as  a cycle  of 
updating/correcting  SA/SU,  followed  by 
modification  of  behaviors  as  that  new 
SA/SU  suggests,  until  goals  are 
achieved  or  a recognized  failure  point 
occurs. 

The  hope  is  that  focusing  on  the 
simulation  of  “being  lost”  in  a geo-spatial 
sense  canalso  provide  a template  for 
dealing  with  being  “lost”  in  more  generic 
imperfect  S/VSU  contexts.  The 
uncertainty  and  errors  that  may  be 
present  in  geo-spatial  information 
certainly  provide  a potentially  rich 
source  of  imperfect  SA/SU  for 
simulation  experiments  and  studies. 
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Abstract.  Using  an  eye  tracker  we  examined  decision-making  processes  during  an  internet  search  task.  Twenty  experienced 
homebuyers  and  twenty-five  undergraduates  from  Old  Dominion  University  viewed  homes  on  a simulated  real  estate  website. 
Several  of  the  homes  Included  physical  properties  that  had  the  potential  to  negatively  impact  individual  perceptions.  These  negative 
externalities  were  either  easy  to  change  (Level  1)  or  impossible  to  change  (Level  2).  Eye  movements  were  analyzed  to  examine  the 
relationship  between  participants’  “stated  preferences”[verbalized  preferences],  “revealed  preferences"[actual  decisions],  and 
experience.  Dwell  times,  fixation  durations/counts,  and  saccade  counts/amplitudes  were  analyzed.  Results  revealed  that 
experienced  homebuyers  demonstrated  a more  refined  search  pattern  than  novice  searchers.  Experienced  homebuyers  were  also 
less  impacted  by  negative  externalities.  Furthermore,  stated  preferences  were  discrepant  from  revealed  preferences;  although 
participants  initially  stated  they  liked/disliked  a graphic,  their  eye  movement  patterns  did  not  reflect  this  trend.  These  results  have 
important  implications  for  design  of  user-friendly  web  interfaces. 

1.0  INTRODUCTION 

Everyday  a large  number  of  people  are 
utilizing  the  internet  for  everything  from 
email  to  grocery  shopping.  This  use  places 
a greater  emphasis  on  the  quality  and 
quantity  of  information  being  presented, 
thus  making  the  design  and  layout  of  w/eb 
pages  a crucial  component  to  decision 
making  and  user  satisfaction.  The  internet 
affords  people  the  opportunity  to  make 
decisions  and  purchase  goods  online  with 
the  simple  click  of  a button.  Whether  the 
decision  involves  the  purchase  of  a 
computer,  a car,  or  even  a home,  a 
significant  proportion  of  preliminary 
purchase  decisions  (or,  “homework”)  can  be 
accomplished  without  ever  having  to  leave 
the  comfort  of  one’s  home.  The  information 
on  specific  aspects  of  these  designs  and 
their  impact  on  a consumer  becomes  a very 
important  consideration  in  this  environment. 

1 .1  Role  of  Experience 
Experience  often  affects  how  individuals 
interact  with  their  environment  and  the 
internet  is  no  exception.  The  amount  of 
expertise  an  individual  possesses  has  been 


shown  to  guide  visual  search  [1].  With 
experts  having  a much  more  refined  and 
effective  visual  search  pattern.  The  study 
performed  by  Reference  [1],  demonstrated 
that  experts  tended  to  have  longer  fixations 
on  items  of  importance  to  their  search  and 
their  gaze  remained  central  to  the  visual 
scene.  Novices  in  comparison  tended  to 
scan  the  entire  scene,  with  no  true  direction 
or  long  fixations  on  anything  of  particular 
importance  to  the  search. 

Experts  and  novices  not  only  differ  in  the 
manner  that  they  scan  a visual  scene  but 
also  in  the  approach  taken  to  analyzing  and 
inferring  information  from  it.  Reference  [2] 
found  that  when  it  came  to  induction  and 
reasoning  experts  were  more  flexible  than 
novices  in  their  ability  to  reason  and  induce 
information  from  a visual  scene.  Overall,  it 
has  been  found  that  experts  use  past 
experience  and  previous  knowledge  to  not 
only  guide  visual  search,  but  to  compensate 
for  any  declining  task-specific  abilities  [3]. 
Experts  use  contextual  cues  and  location 
cues  to  guide  many  of  their  visual  searches. 
This  also  allows  them  to  become  much 
faster  at  refining  visual  searches,  with 
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reaction  times  shortening  with  age  and 
expertise.  Experience  with  certain  visual 
cues  can  also  have  an  effect  on  visual 
search  of  a scene,  with  knowledge  of  former 
individual  cues  influencing  an  individual  in 
either  a positive  or  a negative  way. 

1.2  Negative  Externalities 

We  know  from  past  research  [4]  that  the 
visual  display  of  a website  can  have  a large 
impact  on  an  individual’s  task  performance 
and  in  general  their  primary  search  and 
satisfaction.  Individuals  place  a premium  on 
their  time;  when  they  use  the  internet,  they 
expect  to  find  the  most  relevant  information 
to  their  problem  quickly.  Most  of  the 
research  generated  on  visual  layout  is 
studied  from  the  perspective  of  the 
effectiveness  of  a graphic.  This  study  differs 
from  previous  research  on  graphics  in  that 
we  are  looking  at  how  the  unpleasantness 
of  a graphic,  or  a negative  object  (referred 
to  as  a “negative  externality”)  can  impact 
the  user.  Not  only  in  the  way  it  impacts  their 
visual  search,  but  also  their  preference  for  a 
particular  visual  scene. 

1.3  Stated  versus  Revealed 
Preferences 

The  question  of  interest  is  whether  it  is 
possible  to  design  an  effective  website 
using  the  stated  preferences  of  individuals. 
Do  internet  users  really  know  what  it  is  they 
are  searching  for  and  if  so,  are  they  able  to 
convey  it  verbally?  Do  verbally  stated 
preferences  match  with  preferences  that  are 
revealed  during  actual  internet  search? 

Organizations  of  all  sizes  and  interests 
spend  large  amounts  of  money  every  year 
on  gathering  a consumer’s  stated 
preference  or  SPs’  and  revealed 
preferences  or  RPs’[5].  They  use  this 
information  to  do  what  they  called 
“Consumer  Forecasting,”  or  predicting  what 
consumers  would  want  in  the  future.  They 
were  given  access  to  large  databases  filled 
with  survey  and  interview  information  (SP) 
as  well  as  purchase  histories  (RP)  and  were 
then  asked  to  predict  what  consumers 
would  do  based  on  all  of  the  data.  The 


predictions  they  made  were  conflicting 
depending  on  the  type  of  information  they 
primarily  used  (SP  or  RP).  This  would  seem 
to  demonstrate  that  there  is  a potential 
discrepancy  between  SP  (stated 
preference)  and  RP  (revealed  preference). 

1.4  Eye  Tracking 

Eye  movements  are  the  most  frequent  of  all 
human  movements  and  a reliable 
physiological  measure  of  a psychological 
state.  Eye  tracking  methodology  is  based  on 
Reference  [6]  “eye-mind”  hypothesis:  the 
location  of  a person’s  gaze  directly 
corresponds  to  the  most  immediate  thought 
in  their  mind.  Monitoring  an  individual’s  eye 
fixations  (where  the  eye  stops  for  a 
moment),  their  saccades  (the  rapid 
movements  of  the  eye),  and  scan  paths 
allows  us  to  gain  insight  into  certain  aspects 
of  an  individual’s  cognitive  processes  at  a 
particular  moment  in  time.  This  is  due  to  the 
eye  movements  close  tie  to  attentional 
mechanisms. 

Previous  eye  tracking  studies  have  been 
used  to  specifically  study  how  individuals 
read  and  scan  websites  on  the  internet  [7]. 
When  people  encounter  cognitively  complex 
material,  the  rate  at  which  they  read  tends 
to  slow  down,  as  can  be  indicated  from 
increases  in  fixations  and  decreases  in 
saccade  durations  [8].  In  our  domain  of 
interest,  eye  tracking  can  be  used  as  an 
unobtrusive  way  to  gain  access  and  insight 
into  what  a potential  homebuyer  is 
interested  in  as  they  view  homes  on  the 
internet. 

1 .5  Purpose  of  the  present  study 

This  study  was  designed  to  assess  the 
intrinsic  factor  of  experience  and  its 
relationship  to  extrinsic  negative 
externalities  (pink  paint  and  power  lines). 

SP  and  RP  were  evaluated  in  order  to 
determine  if  a discrepancy  existed.  RP  was 
assessed  through  length  and  number  of 
fixations,  which  is  the  point  at  which  the  eye 
stops  moving  for  a moment.  Also,  number 
as  well  as  amplitude  of  saccade.  From  this 
we  are  able  to  measure  how  difficult  and 
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important  the  information  being  viewed  is 

[9] ,  due  to  the  fact  that  we  know  intense 
cognitive  processing  occurs  during  a fixation 

[10] ,  Thus,  we  hypothesized  that  if  a person 
views  something  important  to  them  they 
should  have  a greater  number  of  fixations 
and  longer  durations  for  each  fixation.  It 
was  also  hypothesized  that  the  greater 
experience  an  individual  possessed  for  the 
search  task  the  more  refined  their  visual 
search  pattern  would  be.  In  assessing  these 
variables,  valuable  information  was 
gathered  regarding  the  optimal  design  of 
these  websites.  This  information  will  allow 
web  designers  to  present  the  most  salient 
and  important  information  to  potential 
homebuyers  quickly  and  effectively. 

2.0  METHOD 

2.1  Participants 

Twenty-five  undergraduates  from  Old 
Dominion  University  and  twenty 
experienced  homebuyers  from  the 
community  were  recruited  to  participate  in 
this  study.  There  were  no  age  requirements 
for  participants,  who  all  had  normal  or 
corrected  vision  (some  participants  wore 
contacts  but  no  participants  wore  glasses) 
none  of  the  participants  were  colorblind. 
Undergraduate  participants  who  finished  the 
experiment  were  compensated  2 extra 
credit  points  at  the  end  of  the  experimental 
session  and  experienced  homebuyers  were 
given  a $50  gas  card  if  they  completed  the 
study, 

All  participants  viewed  ten  homes;  the  same 
ten  homes  were  shown  to  all  participants 
albeit  in  a different  order.  Four  homes  were 
digitally  altered  such  that  they  possessed 
two  levels  of  what  we  designated  as 
“negative  externalities."  A home  with  a 
Level  1 externality  had  a living  room  with  a 
bright  pink  wall;  this  was  considered  a Level 
1 negative  externality  due  to  the  fact  that  a 
homebuyer  could  easily  change  pink  paint. 

A home  with  a Level  2 externality  present 
included  a power  line  in  the  curb  appeal 
picture  (the  first  picture  of  the  home  a 
participant  saw).  Power  lines  were  labeled  a 


Level  2 externality  due  to  the  fact  that  the 
homebuyer  could  not  change  them.  The 
homes  as  well  as  the  Individual  rooms 
within  each  home  were  viewed  in  random 
order  except  for  the  curb  appeal  picture  that 
always  appeared  first;  a separate  computer 
program  generated  this  random  order.  Only 
four  homes  were  altered  to  include  the 
negative  externalities;  each  participant 
viewed  a home  with  a Level  1 and  Level  2 
externality  during  the  experiment. 

2.2  Materials  and  procedure 

We  used  an  Eye  link  1000  eye  tracker, 
which  is  a desk  mounted  eye  tracking 
system  offering  1000  Hz  pupil  and  CR 
(corneal  reflection)  eye  tracking  (takes  1000 
measurements  per  second).  Participants 
were  asked  to  rest  their  head  on  a chin  rest 
during  the  experiment,  ensuring  reliability  of 
the  eye  link  camera.  All  participants  viewed 
10  homes  presented  in  random  order,  very 
similar  to  a typical  real  estate  website;  the 
experimenter  kept  track  of  the  sequence  of 
the  homes  for  data  collection  purposes 
later.  Of  these,  each  participant  viewed  two 
‘substandard’  homes  - one  home  with  a 
bright  pink  painted  wall  (Level  1 negative 
externality)  and  one  home  with  power  lines 
in  the  curb  appeal  photograph  (Level  2 
negative  externality).  Photographs  were 
selected  by  the  real  estate  agency. 

In  order  to  counterbalance  these  homes,  the 
first  half  of  the  participants  observed  the 
house  with  pink  paint  as  house  #4  and  the 
house  with  power  lines  as  house  #7.  The 
second  half  of  the  participants  viewed  the 
pink  paint  on  house  #5  and  the  power  lines 
on  house  #9.  The  homes  were  presented  in 
random  order. 

After  viewing  a room,  participants  would 
rate  them  on  a scale  from  1 (worst  version 
of  that  room)  to  9 (best  version  of  that 
room).  This  rating  for  each  room  was 
treated  as  the  measure  of  “SP”  in  the 
analyses  below.  Once  the  rating  had  been 
provided,  the  experimenter  would  move  on 
to  the  next  picture.  All  participants  received 
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a short  5 minute  break  after  viewing  the  first 
5 homes. 

The  dependent  variables  of  interest  were 
fixation  duration/count,  saccade  count,  and 
saccade  amplitude. 

3.0  RESULTS 

To  evaluate  revealed  preferences,  the  eye 
tracking  variables  of  fixation  duration/count, 
and  saccade  count/amplitude  were 
analyzed  for  each  of  the  homes  containing  a 
negative  externality  (house  #3  & 4,  pink 
paint/  house  #6  & 8,  power  line)  using  6 
(rooms)  X 2 (gender)  X 2 (negative 
externality)  repeated  measure  ANOVAs. 

This  allowed  for  evaluation  of  the 
relationship  between  intrinsic  and  extrinsic 
factors  and  their  effects  on  revealed 
preferences. 

3.1  Role  of  Experience 

A 6 X 2 X 2 ANOVA  of  fixation  duration 
revealed  a significant  interaction  of 
Homebuyer  X Level  1 negative  externality, 
F(5,185)  = 4.91,  p=  .03.  partial  rf=  .117. 
Experienced  homebuyers  had  a longer 
fixation  duration  when  a Level  1 negative 
externality  was  present  and  the  novice 
students  had  a distinctly  opposite  reaction 
with  fixation  duration  declining  with  the 
presence  of  pink  paint  (Level  1 negative 
externality).  The  6 X 2 X 2 ANOVA  also 
revealed  a significant  interaction  of 
Homebuyer  X Level  2 negative  externality, 
F(5,185)  = 12.09,  p < .001,  partial  tf=  .246. 
Experienced  homebuyers  again  had  a 
longer  fixation  duration  when  the  house 
contained  a Level  2 negative  externality 
(power  line),  but  novice  students,  as  before, 
had  a decreased  fixation  duration  in  the 
presence  of  a Level  2 negative  externality 
(see  figure  1 & 2). 

The  6X2X2  ANOVA  of  fixation  counts 
revealed  a significant  3-way  interaction  of 
room  X gender  X experience,  F(5,185)  = 
2.41,  p < .04,  partial  rf  = .061 . Experienced 
male  homebuyers  had  a significantly 
smaller  number  of  fixations,  specifically  for 
the  curb  appeal  photograph  (M  = 57.6,  SD  = 


Figure  1.  Fixation  duration  comparison 
between  neutral  and  Level  1 negative 
externality. 


Neutral  Levei  2 


Figure  2.  Fixation  duration  comparison 
between  neutral  and  Level  2 negative 
externality. 


9.10)  compared  to  novice  male 
undergraduates  (M  = 75.4,  SD  = 10.24).  In 
contrast,  experienced  female  homebuyers 
had  a greater  number  of  fixations  (M  = 66.5, 
SD  = 9.10)  compared  to  the  novice  female 
undergraduates  (M  = 51.58,  SD  = 7.10)  in 
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Figure  3.  Fixation  counts  of  experienced 
homebuvers  vs.  novice  students. 

the  presence  of  a Level  1 externality  (see 
Figure  3). 

A significant  3-way  interaction  of  room  X 
gender  X experience  was  also  found  for 
saccade  count,  F(5,185)  = 2.37,  p < .04, 
partial  tf  = .060.  Experienced  male 
homebuyers  had  a significantly  smaller 
number  of  saccades,  specifically  for  the 
curb  appeal  photograph  (M  = 57.5,  SD  = 
9.10)  compared  to  novice  male 
undergraduates  (M  = 75.3,  SD  = 10.31). 
Experienced  female  homebuyers,  on  the 
other  hand  had  a greater  number  of 
saccades  (M  = 66.2,  SD  = 9. 1 0)  compared 
to  the  novice  female  undergraduates  (M  = 
51.47,  SD  = 7.10)  in  the  presence  of  a Level 
1 externality. 

Lastly,  results  indicated  that  in  the  presence 
of  both  a Level  1 and  Level  2 negative 
externality  experienced  homebuyers  had 
greater  saccade  amplitudes  than  their 
student  counterparts,  F(5,185)  = 4.53,  p < 
.04,  partial  .100/  F{5,185)  = 3.39,  p < 
.07,  partial  )f=  .076. 


1 to  9 how  important  a room  would  be  to 
them  in  a home  search.  The  6 (rooms)  X 2 
(experience)  ANOVA  revealed  a main  effect 
of  room,  F(5,  105)  = 3.95,  p < .01,  partial  if 
= .084.  The  curb  appeal  photograph  was 
consistently  rated  low,  in  terms  of  perceived 
importance  by  the  novice  students  {M  = 
6.34,  SD  = .36)  compared  to  the 
experienced  homebuyers  who  gave  it  a 
much  higher  rating  in  terms  of  importance 
(/W=  7.45,  SD=  .35). 


Evaluating  the  Home  specific  surveys  a 3- 
way  interaction  of  room  X Level  1 externality 
X experience  was  found,  F(5,  105)  = 4.94,  p 
< .03,  partial  if=  .108.  Of  interest  was  the 
rating  given  for  the  living  room;  when  it 
contained  pink  paint  experienced 
homebuyers  rated  it  lower  {M  = 4.0,  SD  = 
.57)  than  novice  students  (M=  5.69,  SD  = 
.99).  When  it  was  neutral,  experienced 
homebuyers  rated  it  higher  (M  = 6.5,  SD  = 
.56)  than  novice  students  (M  = 5.8,  SD  = 
.52). 

3.3  Scan  paths 

Observing  the  scan  paths  of  experienced 
homebuyers  compared  to  novice  students  a 
difference  was  observed  in  the  number  of 
saccades  and  the  amplitude  of  saccades, 
this  also  appeared  to  be  tempered  by 
gender.  Results  and  scan  paths 
demonstrated  that  novice  male  students 
had  a greater  number  of  saccades;  their 
eyes  traveled  around  the  photographs  more 
often  and  their  saccade  amplitudes  were 
shorter  such  that  their  movements  were 
small  bursts  across  the  visual  scene.  This 
when  compared  to  experienced  male 
homebuyers  reveals  that  the  latter  had  a 
smaller  number  of  saccades;  their  eyes 
moved  around  the  photograph  less  often, 
and  because  their  saccade  amplitudes  were 
longer  with  fixations  closer  together.  It 
appears  experienced  male  homebuyers  had 
a predetermined  idea  of  where  in  the  visual 
scene  they  wanted  to  look. 


3.2  Stated  Preferences 

Participants  were  given  a General  home 
survey  that  asked  them  to  rate  on  a scale  of 
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Similar  to  the  pattern  for  male  participants, 
novice  female  participants  also  had  fewer 
saccades  but  just  like  their  novice  male 
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Figure  4.  Scan  paths  with  Level  2 negative  externality  present. 


counterparts  with  shorter  amplitudes.  Their 
eye  movements  were  also  short,  quick 
movements  around  the  visual  scene. 
Experienced  female  homebuyers,  in 
contrast  to  novice  female  participants,  had  a 
greater  number  of  saccades  with  longer 
saccade  amplitudes.  They  again  appeared 
to  have  specific  points  on  the  screen  that 
they  wished  to  analyze  as  indicated  by  the 
longer  saccade  amplitudes,  similar  to 
experienced  male  homebuyers  (see  figure  4 
for  sample  scanpaths). 

4.0  DISCUSSION 

The  purpose  of  this  study  was  to  investigate 
the  intrinsic  factor  of  experience  and  the 
extrinsic  factor  of  negative  externalities  on 
stated  and  revealed  preferences  in  an 
internet  search  task. 

4.1  Role  of  experience 

Results  revealed  experience  with  internet 
home  search  made  a difference  in  the  way 
in  which  the  search  task  was  performed. 
Gender  also  appeared  to  have  an  effect  on 
experience.  For  males  experience  was 
expressed  through  fewer  fixations,  with 


longer  durations,  fewer  saccades,  and 
longer  saccade  amplitudes  indicating  that 
they  were  focused  on  specific  aspects  of 
photographs  and  had  preconceived  ideas 
about  what  they  wished  to  investigate. 
Experienced  females  in  contrast  to  their 
novice  female  counterparts  had  a greater 
number  of  fixations,  with  longer  durations, 
and  fewer  saccades,  but  as  with 
experienced  male  homebuyers  they  also 
expressed  longer  saccade  amplitudes 
again,  indicating  a clear  idea  for  the 
direction  of  their  eye  gaze  in  the  visual 
scene.  Experienced  male  participates  had 
the  fewest  number  of  fixation  counts  and 
saccade  counts  which  seems  to  be 
indicative  of  low  interest  in  the  photographs 
altogether  (reference  [10]). 

Saccade  amplitudes  and  fixation  placement 
were  interesting  in  this  study.  The  saccade 
amplitudes  followed  distinctly  opposite 
patterns  for  experienced  versus  novice 
participants.  Experienced  homebuyers 
demonstrated  a longer  array  of  visual 
movements  than  novice  students  across  the 
webpage;  which  may  indicate  a 
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preconceived  idea  of  where  in  the  visual 
scene  they  would  find  relevant  information 
to  their  search.  Evaluating  scan  paths  it  is 
clear  that  experienced  homebuyers  tended 
to  fixate  on  the  central  portions  of  the  visual 
scene  and  spent  less  time  on  the  perimeters 
which  supports  previous  research 
(reference  [1])  that  experts  tend  to  focus  on 
the  central  aspects  of  a visual  scene. 

The  present  study  is  unique  in  that  it  takes 
into  account  differences  in  experience  using 
physiological  measures  of  an  individual 
during  a search  task  focusing  on  eye 
tracking  indices  of  dwell  time,  fixation 
duration/count,  and  saccade  count/  and 
amplitude.  These  physiological  data 
suggest  that  experienced  homebuyers 
might  be  better  at  acquiring  target  specific 
information  than  novice  students  since  they 
seem  to  localize  their  area  of  interest 
quickly  (revealed  by  longer  saccade 
amplitudes  and  a fewer  number  of 
saccades). 

4.2  Role  of  negative  externalities 

Negative  externalities  in  this  study  were 
operationalized  in  two  levels  depending  on 
the  ease  to  which  they  could  be  modified  by 
the  user  - Level  1 (pink  paint)  and  Level  2 
(power  lines).  From  previous  studies  we 
know  that  a greater  number  of  fixations  and 
the  longer  the  duration  indicate  that  viewers 
are  focusing  intense  cognitive  resources  on 
the  object  being  viewed  [10].  In  this  study 
the  effect  of  the  negative  externality  varied 
by  the  experience  of  the  participant  and  the 
room  that  was  being  viewed.  When  viewing 
the  living  room  photograph  with  pink  paint 
(Level  1 externality)  experienced 
homebuyers  found  it  to  be  less  of  a 
detractor  than  novice  students.  The 
presence  of  the  pink  paint  did  not  stop  the 
experienced  homebuyers  from  investigating 
the  home  more  fully  in  contrast  to  the  novice 
students. 

The  same  was  true  when  a power  line  or 
Level  2 negative  externality  was  present. 
Again,  the  fixation  duration  and  fixation 
count,  were  affected.  The  experienced 


homebuyer  when  presented  with  the  Level  2 
externality  would  spend  more  time  looking 
at  the  home.  Their  eye  would  stop  more 
often  and  for  a longer  duration,  compared  to 
the  novice  student.  The  novice  student 
would  spend  less  time  looking  at  the  home 
photographs,  and  would  fixate  less  often  for 
a shorter  duration  of  time,  compared  to  the 
experienced  homebuyers.  The  results 
indicate  that  the  experienced  homebuyers 
were  less  distracted  by  the  presence  of  a 
Level  2 negative  externality  and  had  more 
interest  in  the  home  photographs  as  a 
consequence. 

It  appears  that  the  Level  1 and  Level  2 
externalities  led  to  fewer  physiological 
influences  on  visual  search  for  experienced 
homebuyers;  instead  their  presence 
perhaps  gave  experienced  homebuyers 
additional  reason  to  scrutinize  the 
photographs  carefully  possibly  to  find 
positive  aspects  to  compensate  for  the 
presence  of  the  Level  1 and  Level  2 
negative  externalities.  In  either  case  the 
presence  of  a negative  externality  appeared 
to  affect  how  participants  viewed  the  entire 
home.  This  is  interesting  news  for 
designers;  it  is  evident  that  one  “bad  apple” 
could  have  the  potential  to  spoil  the  entire 
barrel. 

4.3  Stated  versus  Revealed 
Preferences 

In  the  General  survey  novice  students  rated 
that  the  curb  appeal  photograph  would  be  of 
little  importance  to  their  visual  search  which 
is  in  contrast  to  the  experienced  homebuyer 
who  gave  it  a much  higher  rating  of 
importance  for  the  home  search.  A 
discrepancy  was  observed  when  these 
ratings  were  compared  to  RP  determined 
through  eye  tracking  variables  for  the  novice 
student.  For  fixation  count  and  saccade 
count  there  was  a main  effect  of  room 
caused  by  the  curb  appeal  photograph, 
regardless  of  experience.  In  other  words, 
the  curb  appeal  photograph  generated  the 
maximum  interest  during  visual  search,  the 
experienced  homebuyers  realized  this  and 
there  was  no  discrepancy  between  their  SP 
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and  RP,  but  the  novice  students  did  not.  It  is 
interesting  that  this  finding  was  consistent 
even  for  participants  that  took  the  General 
survey  after  viewing  all  of  the  homes. 

The  Home  specific  survey  also 
demonstrated  a discrepancy  between  SP 
and  RP,  but  only  for  the  Level  1 negative 
externality.  When  pink  paint  was  present  the 
ratings  given  (SP)  were  opposite  of  the  eye 
tracking  measures  recorded  (RP). 
Regardless  of  experience  participants  rated 
the  living  room  photographs  low,  but  when 
we  look  at  their  eye  tracking  variables,  they 
spent  a considerable  amount  of  time 
viewing  those  same  photographs, 
illustrating  a discrepancy.  This  discrepancy 
was  larger  for  the  novice  student  than  the 
experienced  homebuyers.  This  discrepancy 
was  not  found  for  the  Level  2 negative 
externality. 

Overall,  our  results  indicate  that  there  is  a 
difference  between  a person’s  stated 
preferences  and  revealed  preferences; 
although  not  consistent  across  all  variables 
a discrepancy  was  found  and  that 
experience  may  temper  how  large  a 
discrepancy  exists  or  if  one  will  exist  at  all. 

4.4  Implications 

In  the  present  study,  a trend  toward 
discrepancies  between  SP  and  RP  were 
found  dependent  on  experience,  supporting 
previous  research  [1 1]  that  preference  may 
be  something  that  is  formed  in  many 
different  stages  of  a decision  and  that 
experience  may  solidify  that  preference  [2]. 
Furthermore,  differences  were  found  in  the 
search  patterns  that  were  used  by  the 
experienced  and  novice  participants  as  a 
function  of  what  they  were  looking  at  on  the 
webpage.  These  results  have  significant 
implications  for  web  design  for  the 
population  in  general.  The  scan  paths 
revealed  that  the  graphic  portions  of  the 
web  pages  were  indeed  where  participants 
spent  the  greatest  amount  of  their  time 
looking,  reinstating  the  idea  that  visual 
aspects  of  a web  page  are  the  most 
important.  Knowing  your  audience  and  the 


amount  of  experience  they  possess  as  they 
view  a webpage  carries  important 
considerations  for  design  in  the  future. 
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Investigating  Intrinsic  and  Extrinsic 
Variables  During  Simulated  Internet 

Search 


Molly  Liechty,  B.S. 

Poornima  Madhavan,  Ph.  D. 
Old  Dominion  University 


Purpose  of  the  Study: 

• Assess  the  intrinsic  factor  of  experience 
and  its  relationship  to  extrinsic 
negative  externalities  (pink  paint  and 
power  lines) 


• Stated  vs.  revealed  preferences 
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Previous  Research 


• Role  of  Experience 

— Expertise  has  been  shown  to  guide  visual  search 
(Willaims,  Ward,  Knowles,  & Smeeton,  2002) 

• Experts  - more  refined/  Novices  - scanned  entire  scene 

• Experts  - more  flexible/  Novices  - rigid  ideas  (Shafto  & 
Coley,  2003) 

• Preference 

— Discrepancies  have  been  studied,  but  not  through 
the  use  of  physiological  measures/  (Horskey, 
Nelson,  & Posavac,  2004;  Simonson,  1999;  Zajonc, 
1980) 


Research  Questions: 

• Will  experience  alter  the  way  in  which  homes 
are  viewed? 

• How  will  negative  externalities  impact  the 
overall  visual  search? 

• Will  stated  preferences  differ  from  revealed 
preferences? 
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Ocular  Tracking 

Just  & Carpenter  (1976)  "Eye-mind"  hypothesis 


Eye  Tracking  Variables 

• Dwell  Time 

• Fixation  Duration 

• Fixation  Count 

• Saccade  Count 

• Saccade  Amplitude 

— (Loftus  and  Mackworth,  1978) 

— (Rayner  1998) 
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Negative  Externalities 


Pink  Paint 


Power  line 


Experimental  Design: 

• 25  ODD  undergraduates 
- Received  class  credit 

• 20  Homebuyers  from  the  community 
— Received  $50  gas  card 

• No  time  limit  to  view  photographs 

• 10  homeS;  each  home  has  6 photographs 
(curb  appeal,  kitchen,  living  room,  master 
bedroom,  master  bathroom,  and  back  yard) 
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Surveys 


• Demographic 

• General 

— Curb  Appeal 

123456789 

• Home  Specific 

-123456789 
Worst  Average  Best 
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Hypotheses: 

• Interest  in  a scene  = 

- longer  dwell  times 

- longer  fixation  durations 

- greater  number  of  fixations 


• Greater  experience  = 

- more  refined  search 


• The  presence  of  negative  externalities  = 

- shorter  dwell  times 

— shorter  fixation  durations/counts 
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Analysis 

6( rooms)  X 2{expert/novice)  X 2{Negative  Externality) 

^ 6(Home  survey)  X 2(expert/novice)  X 2(Negative  Externality) 
6(General  survey)  X 2(expert/novice) 

Scan  paths 


THIS  WAS  DONE  FOR  THE  FOUR  HOMES  THAT  CONTAINED 
OUR  NEGATIVE  EXTERNALITIES  (HOUSE  #3  & #4,  PINK 
PAINT/  HOUSE  #6  & #8,  POWER  LINE) 


Experience 

• Fixation  Duration:  HomebuyerX  Negative  externality 

- f(5,185)  = 4.91,  p = .03,  partial  tf  = .117  (Level  1) 

- F(5,185)  = 12.09,  p < .001,  partial  tf  = .246  (Level  2) 
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Experience 

• Fixation  Count:  room  X gender  X experience 

- F(5,185)=  2.41,  p<  .04,  port/o/  /;^=  .061. 

• Saccade  Count:  room  X gender  X experience 

- F(5,185)  = 2.37,  p < .04,  partial  tf  = .060 


Experience 

• Saccade  Amplitudes 

• 6 (room)  X 2 (experience)  X 2 (negative  externality) 

- Main  effect  of  experience  (both  Level  1 and  2 neg.  externality) 


House  3 House  4 Houses  Houses 
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Experienced  Male  Homebuyer 


Experienced  Female  Homebuyer 


Novice  Male  Student 


Novice  Female  Student 


Interest  Areas 

Main  Effect  of  Interest  Area: 

House  3:  F{5,185)  = 22.07,  p < .01,  partial  if  = .408 
House  4:  F(5,185)  = 27.07,  p < .01,  partial  if=ASl 
House  6:  F(5,185)  = 10.62,  p < .01,  partial  if  = .244 
House  8:  F(5,185)  = 13.27,  p < .01,  partial  if  = .287 


lAl  IA2  IA3  IA4  IAS  IA6 
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Interest  Areas 
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Stated  vs  Revealed  Preferences 

General  Survey  Rating 

main  effect  of  room,  F(5,  105)  = 3.95,  p < .01, 
partial  if  = .084 


Curbappeal  Kitchen  LivingRoom  Master  Bedroom  Master  Bathroom  BackYard 
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Dwell  Time  (ms) 


Stated  vs  Revealed  Preferences 


• Home  Survey:  room  X Level  1 externality  X experience  was  found 

- f{5, 105)  = 4.94,  p < .03,  partial  .108 
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Discussion 


• Experience: 

• Experience  with  internet  home  search  made  a 
difference  in  the  way  in  which  the  search  task  was 
performed. 

• Saccade  amplitudes  were  of  a distinctly  opposite 
patterns  for  experienced  versus  novice  participants. 
Experienced  homebuyers  demonstrated  a longer  array 
of  visual  movements  than  novice  students  across  the 
webpage;  which  may  indicate  a preconceived  idea  of 
where  in  the  visual  scene  they  would  find  relevant 
information  to  their  search 
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Discussion 


• Experience: 

• These  physiological  data  suggest  that 
experienced  homebuyers  might  be  better  at 
acquiring  target  specific  information  than 
novice  students  since  they  seem  to  localize 
their  area  of  interest  quickly  (revealed  by 
longer  saccade  amplitudes  with  a fewer 
number  of  saccades  and  fixations). 


Discussion 

• Negative  Externalities: 

-Varied  with  the  experience  of  the  participant 

— Level  1 and  Level  2 externalities  led  to  fewer 
physiological  influences  on  visual  search  for 
experienced  homebuyers  compared  to  novice 
homebuyers 

• Stated  vs  Revealed  Preferences 

— There  was  a discrepancy  present 
• Greater  for  novice  homebuyers 
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Practical  Implications 


* Research  Implications 

— Discrepancies  between  stated  and  revealed  preferences 
can  occur  and  researchers  need  to  be  conscious  of  this. 
Level  of  experience  seems  to  interact  with  this 
discrepancy.  An  individual  very  familiar  with  a situation 
may  have  more  concrete  preferences. 


* Web  Design  Implications 

- Graphics  can  be  very  influential 

— Negative  aspects  of  an  image  may  or  may  not  be 

detrimental  depending  on  the  individual's  experience  level 
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5.15  Following  Human  Footsteps:  Proposal  of  a Decision  Theory  Based  on 
Human  Behavior 

Following  Human  Footsteps:  Proposal  of  a Decision  Theory  Based  on 

Human  Behavior 

Faisal  Mahmud,  M.Sc. 

Ph.D.  Student  and  Research  Assistant 
Department  of  Civil  and  Environmental  Engineering 
Old  Dominion  University 
fmahmOO  1@odu.edu 

Abstract.  Human  behavior  is  a complex  nature  which  depends  on  circumstanoes  and  decisions  varying  from  time  to  time  as  well  as 
place  to  place.  The  way  a decision  is  made  either  directly  or  indirectly  related  to  the  availability  of  the  options  These  options  though 
appear  at  random  nature,  have  a solid  directional  way  for  decision  making.  In  this  paper,  a decision  theory  is  proposed  which  is 
based  on  human  behavior.  The  theory  is  structured  with  model  sets  that  will  show  the  all  possible  combinations  for  making  a 
decision  A virtual  and  simulated  environment  is  considered  to  show  the  results  of  the  proposed  decision  theory 


1.0  INTRODUCTION 

When  we  eat,  move  or  work,  it  is  natural 
that  we  are  driven  by  some  will  power  of  our 
own.  Sometimes  this  will  power  is  guided  or 
supported  by  available  options.  These 
options  though  appearing  as  random 
nature,  have  a solid  directional  way  for 
decision  making.  No  matter  what  we  choose 
to  do,  our  action  is  based  on  sets  of 
decisions  that  influence  our  will  power.  It  is 
also  true  that  human  behavior  to  make  a 
decision  for  the  same  goal  may  vary  from 
time  to  time.  There  have  been  done  various 
types  of  work  related  to  human  behavior 
and  decision  making.  Stewart  Robinson  has 
been  investigating  the  use  of  artificial 
intelligence  methods  as  a means  for 
representing  human  decision-making  in 
simulations  since  mid-1990s.  One 
motivation  for  his  work  for  modeling  human 
decision-making  was  to  add  extra 
complexity  to  a model  in  order  to  improve  its 
accuracy.  The  goal  of  this  paper  is  quite 
different  than  the  previous  works.  The  aim 
of  this  paper  is  to  establish  a theory  which  is 
based  on  human  behavior  for  decision 
making.  The  theory  itself  is  developed  by 
observing  human  acts  in  real-life  and  field 
surveys.  It  should  be  mentioned  that  the 
results  presented  here  are  not  yet 
supported  from  a psychological  point  of 
view  by  the  experts  in  that  field;  rather,  an 
engineering  analysis  with  field  level 


observation  is  put  together  to  support  the 
study. 


2.0  METHODOLOGY 

2.1  Approach  for  Analysis 

Consider  a simple  scenario  like  this  - a 
person  wants  to  go  from  one  place  to 
another,  suppose  from  home  to  work  (home 
based  work  trip).  He/she  has  the  choice 
from  three  modes  of  alternatives  - (1)  by 
walk,  (2)  by  car  or  (3)  by  public  transport 
like  bus.  Now  let’s  look  at  when  the  person 
will  choose  a specific  mode  of  transportation 
to  reach  the  destination. 

(1)  The  person  will  walk  from  origin  to 
destination  when  - 

■ Other  two  options  are 
unavailable 

■ Those  options  will  take  more 
time  than  walking  when 
he/she  is  in  a hurry 

■ Relative  cost 

■ He/she  is  prescribed  by  a 
doctor  to  walk  for  this 
particular  trip 

■ On  his/her  way  to  the  work,  it 
requires  to  do  another  work 
which  is  easy  if  walking 
option  is  chosen. 
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(2)  The  person  will  choose  car  from  origin  to 
destination  when  - 

■ This  option  is  available  to 
him/her 

■ It  will  take  less  time  than 
walk  or  by  taking  bus 

■ Relative  cost 

■ Weather  condition 

■ On  his/her  way  to  the  work,  it 
requires  to  do  another  work 
which  is  easy  if  car  option  is 
chosen 

■ If  he/she  can  drive  or 
someone  will  drive  for 
him/her 

■ It  is  comfortable  as  well  as 
safe  and  secured. 

(3)  The  person  will  choose  bus  from  origin 
to  destination  when  - 


he/she  can  decide  to  go  for  a particular 
option  depends  on  - 

■ Better  available  option  within 
the  options  or 

■ Random  choice. 

This  random  choice  is  also  a factor  which  is 
related  to  other  specific  taste  factors  like 
style,  job  position  and  protocols,  or  simply  a 
person’s  own  desire  at  that  very  time  and 
day. 

This  example  reveals  the  fact  that  human 
decision  for  a particular  thing  (in  this  case 
home  based  work  trip)  may  vary  depending 
on  the  availability  of  the  options  as  well  as 
by  random  choice.  To  create  a structured 
decision  tree  in  this  regard  may  give  a solid 
base  to  explain  human  behavior  in  making 
decision  theory. 


■ This  option  is  available  to 
him/her 

■ It  will  take  less  time  than 
walk  or  by  taking  car 

• Relative  cost 

■ Weather  condition 

■ Taking  bus  than  car  is  a 
better  option  for  this 
particular  trip 

■ It  is  comfortable  as  well  as 
safe  and  secured 

■ He/she  is  supposed  to  take 
the  bus  rather  than  car 
because  there  is  a special 
bus  service  for  the  office 
where  he/she  works. 

Now,  if  we  go  for  an  analysis  for  this 
example,  we  can  identify  some  common 
matches  within  these  three  particular 
options  - 


■ Availability  of  the  options 

■ Relative  cost 

■ Time  required  for  the  trip 

■ Safety,  security  and  comfort. 

Again,  if  the  person  has  equal  availability  to 
all  three  alternative  options,  then  the  way 


3.0  DECISION  TREE 

In  order  to  put  all  available  options  together 
for  a particular  decision  making,  the 
following  decision  tree  is  shown; 


Figure  1;  General  Decision  Tree  for  Making 
a Particular  Decision. 

In  Fig.  1,  a decision  tree  is  shown  for 
making  a particular  decision.  If  we  correlate 
this  decision  tree  with  the  given  example  we 
can  find  out  how  human  behavior  captures 
the  options  in  making  a decision.  Each  box 


761 


in  the  figure  corresponds  to  different  types 
of  decision  sets. 


4.0  DATA  SETS  FOR  ANALYSIS 

The  data  for  this  analysis  is  collected  from 
Dhaka,  the  capital  of  Bangladesh.  Two 
months  worth  of  data  from  different 
particular  locations  within  Dhaka  is  used  for 
the  analysis.  All  the  field  surveys  and 
interviews  of  public  as  well  as  observations 
are  put  together  for  the  proposal  of  the 
decision  theory.  For  this  study,  the  surveys 
and  observations  are  conducted  for 
person’s  trip  from  one  place  to  another 
considering  different  available  options.  It 
was  checked  from  public  opinions  how  they 
react  and  get  used  to  with  the  changes  of 
transportation  alternatives.  Three  categories 
of  people  were  chosen  based  on  their 
income  - (1)  High  income,  (2)  Middle 
income  and  (3)  Low  income  people.  The 
reason  to  choose  the  income  category  is 
because  it  is  one  of  the  most  influential 
factors  that  may  control  the  choice  of  modes 
in  transportation  alternatives. 


4.1  Simulated  Analysis 

In  addition  to  real-life  field  surveys  and 
observations,  a simulated  environment  was 
also  created  to  explain  the  human  behavior. 
Because  of  the  extensive  simulation  runs 
with  all  the  decision  sets,  corresponding 
results  are  not  yet  achieved. 


5.0  RESULTS 

The  results  from  this  analysis  can  be 
explained  from  observed  data  as  well  as 
field  surveys.  It  has  been  found  that  flexible 
option  is  only  available  to  the  people  of 
higher  income  and  in  some  cases  to  the 
middle  income  people.  People  with  low 
income  have  none  but  the  rigid  option. 
Therefore,  if  we  want  to  investigate  the 
movement  analysis  of  low  income  people 
based  on  the  use  of  transportation 
alternatives,  we  do  not  need  to  go  for  further 


details  - they  are  accessible  to  a place  by 
either  on  feet  or  by  public  transportation. 
Now,  let’s  take  into  account  the  high  and 
middle  income  people.  In  this  case,  it  is 
quite  uncertain  which  alternative  mode  they 
will  consider  for  travel  - personal  car,  or  bus, 
or  by  walking?  They  have  the  flexible  option 
and  when  a category  of  people  are  within 
this  flexible  option,  in  depth  analysis  is  quite 
necessary  to  explain  their  movement.  The 
analysis  result  showed  that  people  will 
choose  a better  option  within  all  available 
options  if  and  only  if  we  can  control  the 
random  choice  - which  is  nothing  but  a 
choice  based  on  personal  satisfaction.  A 
question  may  arise  why  decision  theory  is 
proposed  based  on  human  behavior  or  what 
is  the  importance  of  this  type  of  research? 
First  of  all,  if  we  can  analyze  human 
behavior  for  a particular  action,  for  an 
example,  in  this  research  choice  of 
alternative  modes  is  the  primary  area  of 
analysis,  we  may  go  ahead  to  control  that 
behavior  and  thus  do  further  analysis  for 
controlling  specific  kind  of  behavior  when 
we  need  to  do  so.  Suppose,  in  the  field  of 
transportation  engineering,  sometimes  we 
need  to  introduce  special  types  of  traffic 
management  policy  to  a certain  route  or 
area  for  either  a special  event  or  to  control 
and  avoid  traffic  jam.  A city  without  proper 
planning  and  lack  of  sufficient  roads  may 
suffer  a gridlock  situation  and  therefore 
need  to  control  human  behavior  in  choosing 
alternative  modes  of  transportation.  The 
study  area,  Dhaka,  is  now  facing  extreme 
traffic  jam  resulting  in  economic  loss  to  the 
country.  An  option  to  solve  the  system  is  to 
control  human  behavior  in  selecting 
alternative  modes  for  travelling.  If  we  can 
control  human  behavior  to  make  a decision 
for  choosing  a mode  in  this  city  area,  we 
may  overcome  the  problem  to  a certain 
extent.  The  way  to  control  this  is  to  control 
the  flexible  option  of  the  people.  Though  this 
is  not  a fair  way  to  guide  human  decision  to 
a particular  track,  but  to  some  extent  it  is 
better.  The  reason  is  that,  if  people  become 
dependants  of  using  private  cars  by  ignoring 
public  transportation  and  for  using  private 
cars  by  20%  of  the  people  in  the  city  center 
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causes  a problem  to  the  rest  of  the  80% 
people,  it  is  better  to  cut  that  available 
flexible  option  of  the  20%  people  in  using 
their  private  cars.  There  may  be  argument 
on  this  matter,  but  the  goal  of  this  paper  at 
this  point  is  not  to  go  for  these  arguments, 
rather  than  to  suggest  a theory  on  decision 
making. 


6.0  CONCLUSION 

Human  behavior  is  a complex  nature  to 
explain  with  theoretical  or  simulated 
analysis.  But,  real-life  observation  with 
practical  field  survey  can  help  to  explain  this 
complex  nature  in  simulated  environment. 
“The  motivation  is  to  model  human  decision- 
making so  it  is  better  understood  and  it  can 
be  improved.  This  should  help  to  improve 
the  performance  of  the  systems  in  which  the 
humans  are  interacting.  The  concentration 
is  no  longer  on  making  models  more 
accurate,  but  on  using  the  models  to  assess 
the  effects  of  human  interaction  and  to  look 
for  ways  of  changing  the  human  interaction 
in  order  to  improve  system  performance. 
Model  accuracy  plays  a secondary  role  to 
generating  insight  and  understanding.  This 
is  the  motivation  behind  the  knowledge 
based  improvement  methodology,”  (Stewart 
Robinson,  Modelling  Human  Decision- 
Making,  para.  27).  The  aim  of  this  paper  is 
to  do  this  with  supportive  evidence.  The 
only  limitation  that  may  exist  is  the 
psychological  analysis  of  the  particular 
study.  Otherwise,  this  work  is  a unique 
approach  to  propose  a decision  theory. 
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Abstract.  \Ne  adopt  Markov  Decision  Processes  (MDP)  to  model  sequential  decision  problems,  which  have  the  characteristic  that 
the  current  decision  made  by  a human  decision  maker  has  an  uncertain  impact  on  future  opportunity.  We  hypothesize  that  the 
individuality  of  decision  makers  can  be  modeled  as  differences  in  the  reward  function  under  a common  MDP  model.  A machine 
learning  technique,  Inverse  Reinforcement  Learning  (IRL),  was  used  to  learn  an  individual’s  reward  function  based  on  limited 
observation  of  his  or  her  decision  choices.  This  work  serves  as  an  initial  investigation  for  using  IRL  to  analyze  decision  making, 
conducted  through  a human  experiment  in  a cyber  shopping  environment.  Specifically,  the  ability  to  determine  the  demographic 
identity  of  users  is  conducted  through  prediction  analysis  and  supervised  learning.  The  results  show  that  IRL  can  be  used  to 
correctly  identify  participants,  at  a rate  of  68%  for  gender  and  66%  for  one  of  three  college  major  categories. 


1 INTRODUCTION 

There  has  been  significant  work  in  the  field 
of  machine  learning  to  understand  human 
decision  making.  Inverse  Reinforcement 
Learning  (IRL)  is  a method  for  computers  to 
learn  to  perform  complex  tasks  by  watching 
human  operators  [2].  IRL  is  built  upon 
Markov  Decision  Processes  (MDPs),  which 
examine  sequential  decision  making  over 
time.  Decision  makers  are  modeled  to 
choose  actions  based  upon  maximizing 
reward,  which  is  captured  by  a reward 
function  that  assigns  preferences  to  being  in 
certain  states.  Decisions  made  in  the 
present  directly  impact  future  decisions  and 
opportunities,  often  stochastically,  so  short- 
term gain  must  be  balanced  against  future 
goals.  Decisions  are  complex  because  an 
individual  may  have  many  actions  to  choose 
between  and  may  have  to  assimilate 
various  pieces  of  information  and  trade-offs 
between  conflicting  goals.  These  types  of 
decisions  are  commonplace  in  daily  life, 
from  choosing  which  lane  to  drive  in  on  the 
interstate  to  choosing  when  to  buy  or  sell 
stocks. 


Our  thesis  is  that  IRL  techniques  can  be 
used  to  understand  human  decision  making 
by  creating  a mathematical  model  of  the 
human’s  decision  strategy.  We  do  not  claim 
that  people  solve  complex  mathematical 
formulae  mentally  while  making  difficult 
decisions;  however,  a projection  of  their 
preferences  can  be  captured  through 
machine  learning.  Specifically,  we  can  begin 
to  understand  under  which  conditions  an 
individual  would  take  a certain  action  and 
therefore  find  if  people  adopt  different 
strategies  to  the  same  problem.  There  is 
reason  for  optimism  that  IRL  can  model 
decision  making.  Researchers  have  run 
controlled  experiments  where  a participant 
is  instructed  to  exhibit  certain  preferences 
and  have  shown  heuristically  that  a 
computer  is  able  to  mimic  the  behavior  by 
solving  a mathematical  version  of  the 
problem  [2].  We  feel  that  IRL  does  indeed 
capture  aspects  of  an  individual's  true 
decision  rules,  but  the  previous  work  has 
not  tried  to  verify  this  important  requirement 
for  many  applications  through  rigorous 
analysis. 
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1,1  Expected  Contribution 

This  work  identifies  a bridge  between  those 
who  develop  solutions  to  sequential 
decision  problems  and  those  who  have 
methods  to  test  and  quantify  human 
behavior.  In  broad  terms,  the  two  fields  can 
be  defined  as  machine  learning  and 
cognitive  science.  Machine  learning 
encompasses  artificial  intelligence  and 
reinforcement  learning  as  researchers  who 
train  computers  to  solve  decision  problems 
that  may  be  too  difficult  for  humans  to  solve. 
Cognitive  science  studies  how  the  human 
brain  uses  information,  and  cognitive 
scientists  run  controlled  experiments  to 
investigate  the  impact  of  some  changing 
condition  on  human  performance.  The  two 
fields  join  when  researchers  use  machine 
learning  algorithms  to  understand  human 
decision  making.  This  work  lies  in  this 
middle  area,  as  we  investigate  the  potential 
of  IRL  to  analyze  decision  strategies 
through  human  experimentation. 

In  the  machine  learning  literature  found 
predominantly  in  the  engineering  field, 
researchers  have  not  validated  that  IRL 
captures  human  decision  making  through 
robust  experimentation.  The  literature  is 
focused  on  improving  algorithms  in  terms  of 
speed  and  accuracy  [6],  or  adapting  work  to 
apply  to  a larger  class  of  problems  [1,3]. 
The  algorithms  are  heuristically  validated  by 
instructing  human  experts  to  follow  different 
strategies  that  map  well  onto  the  qualities 
the  computer  was  trained  to  learn.  The 
machine  learning  literature  lacks  hypothesis 
testing  that  would  demonstrate  that  IRL  can 
find  differences  in  decision  making  between 
groups  of  people,  and  we  therefore  look  to 
the  cognitive  science  field  to  find  studies 
analyzing  human  sequential  decision 
making. 

Cognitive  science  is  devoted  to 
understanding  how  humans  make  use  of 
information  in  the  brain  and  is  therefore 


closely  related  to  characterizing  decision 
making.  Researchers  in  cognitive  science 
make  use  of  human  experiments  to  perform 
hypothesis  testing;  often  to  compare  two 
groups  of  people  to  one  another.  There 
have  been  studies  where  IRL  and  MDPs 
could  be  used  to  analyze  the  data  gathered 
from  human  experiments,  but  researchers 
lost  power  by  only  using  results-based 
analysis.  For  instance,  [4]  performed  a 
sophisticated  experiment  with  a motorcycle 
simulator  and  asked  the  riders  to  identify 
potential  hazards  and  collected  eye-gaze 
data.  The  researchers  could  have  sought  to 
understand  where  the  user  was  looking  as  a 
function  of  the  objects  on  the  screen,  but 
instead  were  relegated  to  analyzing  the 
higher-level  metric  of  general  size  of 
viewing  area. 

There  has  been  a great  deal  of  work  in 
the  economics  field  to  investigate  the  ability 
of  mathematical  models  to  describe  real 
human  behavior.  Ref.  [5]  completed  a 
survey  of  research  in  predominantly  the 
economic  field  that  analyzed  human 
decision  making  with  respect  to  MDPs. 
They  found  that  humans  perform  near- 
optimal  behavior  in  discrete  decision 
problems,  but  the  opposite  was  true  for 
continuous  decision  problems.  As  a case 
study,  they  highlighted  work  by  RAND 
where  the  decision  of  Air  Force  pilots  to 
remain  in  service  or  retire  to  the  civilian 
sector  was  analyzed.  Among  other  practical 
conclusions,  the  work  showed  that 
prediction  is  a valid  method  for  testing 
MDPs  as  a decision  framework. 

2 MATHEMATICAL  FORMULATION 

IRL  refers  to  any  method  where  a reward 
function  is  learned  to  mimic  expert  behavior 
through  observation  [2],  The  foundational 
premise  is  that  a rational  actor  may  choose 
between  several  actions  and  may  conduct 
analysis  to  determine  the  best  course  of 
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action.  Decisions  are  captured  in  a 
mathematical  model  that  can  be  analyzed 
and  optimized  to  find  the  best  action.  The 
theory  is  applied  to  sequential  decision 
making  where  the  actor  will  have  to  make  a 
series  of  time-ordered  decisions.  This  raises 
a difficult  problem  that  requires  analysis  to 
solve  because  current  actions  impact  future 
decisions  and  opportunities. 

2.1  Markov  Decision  Processes 

IRL  uses  the  well-understood  framework  of 
Markov  Decision  Processes  (MDP).  MDPs 
are  built  upon  the  idea  that  all  of  the 
information  one  needs  to  make  a decision  is 
characterized  by  the  state  of  the  system. 
Markov  chains  become  powerful  when 
applied  to  decision  making  because  the 
probability  of  transitioning  to  a certain  state 
is  dependent  on  the  current  state  and  the 
decision  maker's  action.  The  decision  maker 
chooses  actions  at  every  time  point  with  the 
updated  knowledge  of  his  or  her  situation. 
Decisions  can  be  chosen  greedily  to 
maximize  short-term  gain,  but  it  is  clear  that 
since  decisions  made  in  the  present  directly 
affect  future  opportunities  that  a farsighted 
strategy  is  needed  to  make  the  best 
possible  decisions. 

We  use  the  notation  from  [2]  to 
formulate  Markov  Decision  Processes.  An 
MDP  is  fully  described  by  the  tuple 
(S,A,T,y,D,R),  where: 

• 5 is  the  set  of  all  possible  states,  and  the 
state  at  time  t is  given  by  Sf 

• A is  the  set  of  all  possible  actions,  and 
the  particular  action  chosen  at  time  t is 

• T is  the  function  of  state  transition 
probabilities. 

• ye  [0,1)  is  a discount  factor 

• Z)  is  the  initial-state  distribution 


• R \s  the  transition  reward  gained  from 

taking  action  a^.  at  while  transitioning 

to 

Once  the  MDP  has  been  completely 
formulated,  the  goal  is  to  solve  the  problem 
by  developing  an  optimal  policy  n that  maps 
an  optimal  action  to  every  state.  Due  to  the 
stochastic  nature  of  MDPs,  the  objective  is 
to  choose  actions  that  maximize  total 
expected  reward.  The  goal  of  the  decision 
maker  is  to  find  n that  maximizes  and 
therefore  know  which  action  to  choose  at 
t = 0.  Once  the  system  transitions  to  the 
next  state  at  t - 1,  then  the  actor  has  the 
information  necessary  to  take  the  best 
action,  i.e.  the  actor  does  not  determine  at 
t = 0 how  he  or  she  will  act  in  the  future. 
Once  the  problem  has  been  formulated  as 
such,  the  optimal  policy  may  be  derived 
through  dynamic  programming  or 
reinforcement  learning. 

2.2  Discretized-Reward  Search 
Method  for  IRL 

As  discussed  above,  the  computer  learns  to 
mimic  a human  by  learning  the  problem  that 
the  expert  is  attempting  to  solve.  [2]  places 
constraints  on  the  problem  definition  so  that 
IRL  uses  a linear  reward  function  in  order  to 
apply  standard  optimization  techniques  to 
perform  policy  evaluation.  If  we  relax  these 
constraints,  then  we  void  the  developed 
algorithm  and  must  perform  IRL  in  another 
manner.  We  have  developed  an  exhaustive 
search  algorithm  by  discretizing  the  space 
of  reward  functions  to  a finite  set  in  order  to 
attribute  reward  functions  to  actions  which, 
although  It  has  its  limitations,  works  for  a 
broader  class  of  problems. 

The  process  of  mapping  a reward 
function  to  an  observed  action  path  x is  as 
follows: 
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1.  Start  with  initial  weight  m/°,  which  is  the 
starting  point  for  weight  iteration.  There 
must  be  some  method  to  iterate  through 
all  of  the  feasible  weights.  For  example, 
if  we  choose  |w|i  = 1,  then  the  first 
weight  could  be  w°  = (1,0, ...),  and  the 
next  weight  would  be  = (0.9,0.1, ...). 
Set  i = 0. 

2.  Solve  or  approximate  the  optimal  policy 
to  the  MDP  where  /?  = w'  • cp(s). 
Simulate  an  action  path  using  the 
optimal  policy  and  set  as  x‘.  Use  the 
size  of  observed  actions  x as  stopping 
criteria  if  necessary. 

3.  Use  a reward  distance  function  to  find 
the  difference  in  the  rewards  generated 
by  X and  with  respect  to  w'  and  set 
as  d\ 

4.  If  is  not  the  last  weight,  then  find 

and  set  t = i + 1.  Return  to  Step  2. 

5.  Find  the  minimum  value  for  d'  and 
create  a set  of  all  the  w‘  with  that  value. 
These  are  all  of  the  weights  and 
corresponding  reward  functions  that 
match  the  observed  actions. 

There  are  several  design  choices  in  the 
problem  definition  that  are  necessary  to 
implement  this  method.  The  set  of  all  weight 
vectors  must  be  discretized  into  a finite 
countable  set,  and  there  must  be  a method 
for  iterating  through  the  set.  The  MDP  must 
either  be  able  to  be  solved  through  dynamic 
programming  or  an  optimal  solution  must  be 
approximated  with  reinforcement  learning. 
Finally,  a distance  function  must  be 
developed  to  compare  the  expert’s  policy 
and  optimal  policies  generated  for  candidate 
reward  functions. 

3 METHODOLOGY 

We  conducted  human  experiments  to 
investigate  the  capability  of  using  inverse 
learning  methods  to  perform  identity 
prediction.  A task  that  meets  the  criteria  of  a 
sequential  decision  problem  is  online 
shopping.  Shoppers  navigate  an  online 


environment  searching  for  items,  and  their 
actions  can  be  readily  extracted  from 
looking  at  browsing  history  data.  By 
recording  their  browsing  history,  we  have  a 
noninvasive  sequential  view  of  their  actions 
and  can  determine  how  the  user  assimilated 
information  to  make  decisions.  Inverse 
learning  calculates  the  user's  policy  in  all 
situations  and  will  describe  the  user's 
objective  function.  We  will  be  able  to 
characterize  how  a particular  user  performs 
the  task  of  shopping  for  an  item. 

We  developed  an  experiment  to  test 
how  participants  perform  the  task  of 
purchasing  a gift  using  an  online  shopping 
website.  Each  participant  underwent  a 30 
minute  experiment,  during  which  they 
performed  4 trials.  At  the  start  of  each  trial, 
the  participant  is  given  a profile  of  a person 
to  buy  a gift  for,  which  includes  personal 
characteristics  and  possible  suggestions  of 
what  that  person  may  like  or  dislike.  The 
user  was  given  5 minutes  and  a budget  of 
$100  to  perform  the  task,  during  which  time 
he  or  she  browsed  the  item  selection 
provided  by  the  website  and  selected  one  or 
more  gifts  to  purchase.  Participants  were 
not  given  any  instruction  except  for  the 
profile  of  the  participant  and  to  remain  on 
the  shopping  site  and  not  view  another  site. 
After  some  pretesting,  we  determined  there 
were  10  predominant  types  of  pages 
available  at  Walmart.com  (e.g.  store 
department  page,  item  list  page,  and 
checkout  page). 

3.1  Setup  of  the  MDP  and 

Corresponding  IRL  Method 

We  set  the  state  vector  to  represent  the 
number  of  pages  of  each  type  the  user  has 
viewed.  State  transitions  are  deterministic, 
as  the  user  fully  decides  which  page  type  to 
view  next.  With  a standard  reward  function, 
the  optimal  policy  would  simply  choose  to 
view  the  page  type  with  the  highest  reward 
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over  and  over  again.  A reward  function  that 
causes  users  to  switch  pages,  as  opposed 
to  choosing  the  same  one  over  and  over 
again,  would  be  one  that  took  into  account 
the  law  of  diminishing  returns.  A user  may 
prefer  to  view  one  type  over  another,  but  as 
they  view  that  page  multiple  times  they 
receive  decreasing  reward.  If  we  let  M be 
the  maximum  number  of  pages  a user 
wishes  to  view  of  a certain  type,  we  could 
scale  the  reward  gained  from  choosing  a 
page  by  a factor  that  is  inversely 
proportional  to  the  number  of  times  the 
page  was  viewed  up  to  M visits.  In  Eq.  (3.1), 
a is  the  action  corresponding  to  the  page 
the  user  wants  to  view  next,  s is  the 
complete  state,  is  the  current  number  of 
pages  of  type  a that  the  user  has  viewed, 
and  Wa  is  the  weight  corresponding  to  that 
page  type. 

Ris,a)  = Wa*{M-iSa  + l))  (3.1) 

This  reward  function  is  nonlinear;  it  is 
not  a linear  combination  of  the  state  variable 
because  only  the  part  of  the  state  regarding 
the  action  taken  contributes  to  the  transition 
reward.  We  therefore  use  the  Discretized- 
Reward  Search  Method.  There  are  many 
different  ways  to  discretize  the  space.  We 
chose  to  have  each  weight  be  nonnegative, 
and  the  sum  of  the  weights  was  equal  to  1 , 
so  that  the  possible  value  for  each  weight  Wj 
was  [0,  1].  We  also  set  the  granularity  of 
each  weight,  such  that  a value  of  10  meant 
we  divided  the  range  of  [0,  1]  into  10  equal 
parts,  i.e.,  Wj  = 0.0,0.1,  ...,0.9,1. 0.  The 
analysis  reported  here  was  performed  using 
finer  granularity  of  20. 

We  developed  a method  to  find  the 
distance  between  two  policies  under  a 
single  reward  function.  Instead  of  simply 
counting  how  many  times  the  user  policy 
and  optimal  policy  differed,  we  used  the 
amount  of  reward  each  policy  generated  as 
a differencing  metric.  The  Incremental 


Reward  Difference  method  (IRD)  compares 
two  action  paths  by  sequentially  examining 
each  time  period  and  finding  the  difference 
in  the  total  accumulated  reward  up  to  that 
point.  For  example,  consider  a simple 
reward  function  of  R = 0.4si  + 0.6s2>  and  we 
had  one  policy  of  (1, 1,2,2)  and  another 
policy  of  (2,2, 1,1).  The  total  reward 
accumulated  by  both  policies  is  2.0,  so  it  is 
important  to  have  a metric  that  takes  into 
account  sequence  order.  In  our  method,  the 
difference  of  the  total  reward  accumulated 
after  the  first  period  is  0.2  (0.6-0.4),  after  the 
second  it  is  0.4  (1. 2-0.8),  after  the  third  it  is 
0.2(16-1.4),  and  after  the  fourth  it  is  0 (2.0- 
2.0).  Therefore,  the  difference  between  the 
policies  is  0.8,  which  takes  into  account 
sequence  and  end  result. 

For  each  experiment  observation,  we 
store  all  of  the  reward  functions  that  were 
closest  to  the  expert  and  use  a measure  of 
central  tendency  as  the  point  estimate  of  the 
true  reward  function.  The  standard  method 
to  measure  distance  between  two  n-tuple 
vectors  is  Euclidean  distance.  Standard 
cluster  analysis  uses  the  centroid  as  the 
averaging  measure  for  a group  of  points, 
but  this  most  likely  will  lead  to  an  impossible 
reward  function.  Instead,  we  find  the  medoid 
(found  in  k-medoid  cluster  analysis),  which 
is  the  element  in  the  cluster  that  has  the 
shortest  average  distance  to  every  other 
point  in  the  cluster. 

3.2  Weights  of  Evidence  Prediction 
Models 

Rating  the  quality  of  generated  rewards  by 
IRL  is  directly  dependent  on  the  application. 
We  have  chosen  to  examine  identity 
prediction  in  the  sense  that  we  could  find 
someone’s  reward  function  and  correlate 
identifying  information  by  comparing  against 
known  data.  We  therefore  desire  the  reward 
functions  to  group  people  into  clusters 
based  upon  demographic  similarities.  In  this 
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section  we  discuss  how  we  rate  whether 
meaningful  clusters  are  formed  by  analyzing 
experimental  data. 

Scoring  models  can  be  used  to  identify 
separation  in  the  data  and  provide  a means 
for  prediction.  Weights  of  Evidence  (WOE) 
are  used  to  convert  data  from  an  individual 
into  a single  score,  and  it  is  desired  that 
scores  are  able  to  differentiate  people. 
Scoring  models  predict  a binary  outcome, 
such  as  good  (G)  or  bad  (B),  according  to  a 
vector  of  features.  Given  a feature  vector  x, 
the  quatities  of  interest  are  P(G\x)  and 
P{B\x).  The  score  s is  the  log  odds  score, 
which  can  be  broken  into  a population  score 
Sp„p  and  an  information  odds  score  by 

using  Bayes  Rule  and  the  properties  of 
logarithms,  as  shown  in  Eq.  (4.7). 


s(x)  = In 
= In 


P(cjy) 
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(4.7) 


The  information  odds  score  can  be 
calculated  from  the  data  as  the  distribution 
that  the  feature  vector  takes  a value  given 
the  person  is  good  or  bad.  If  each  variable 
in  the  feature  vector  is  conditionally 
indendent  given  the  individual  is  good  or 
bad,  then  the  information  score  is  given  by 
Eq.  (4.8). 
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Each  log  odd  in  the  information  score  is 
the  WOE  indicating  G for  that  particular 
variable.  The  WOE  is  the  log  odds  that  the 
feature  Xj  takes  on  a particular  value  given 
the  person  is  good,  and  can  be  directly 
calculated  from  the  data.  For  instance,  the 
value  /(xi  = 0.1|C)  is  the  proportion  of  the 
number  of  good  people  where  x^  =0.1  over 
the  total  number  of  good  people.  This 


method  requires  a descritization  of  each 
variable  x into  multiple  bins. 

4 IMPLEMENTATION  AND 
RESULTS 

We  discuss  our  findings  with  the  caveat  that 
the  analysis  was  exploratory,  and  there  was 
no  previous  work  or  principles  that  people 
grouped  according  to  the  tested 
demographic  factors  are  expected  to 
perform  the  task  differently  (e.g.,  there  is  no 
definitive  theory  that  males  utilize  a different 
shopping  strategy  than  females).  However, 
IRL  methods  that  find  more  correlation 
between  demographic  group  and  strategy 
are  preferable,  and  this  metric  can  be  used 
in  model  selection  when  choosing  between 
several  predictive  methods. 

4.1  Results  from  WOE  Scoring 
Models 

For  each  IRL  model,  we  developed  credit 
scoring  models  for  the  gender  and  major 
variables.  For  the  binary  variable  gender  we 
calculated  for  male  and  not  male,  while  for 
major  we  had  to  make  three  models  for  arts 
and  not  arts,  engineering  and  not 
engineering,  and  commerce  and  not 
commerce.  Each  model  was  built  using  the 
10  weights  from  the  reward  function  as 
predictive  features.  The  features  were 
separated  into  bins  based  upon  taking 
values  of  0 through  0.3  and  an  additional 
bin  for  being  greater  than  0.3.  Once  the 
weights  of  evidence  were  calculated  by 
determining  the  log  odds  that  a feature  took 
a particular  value,  scores  were  assigned  to 
each  trial  based  upon  the  reward  function. 
Frequency  plots  showed  the  distribution  of 
scores  according  to  the  group  the  individual 
belonged  to.  The  frequency  plots  for  the 
model  are  shown  in  Fig.  4.1. 
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Fig,  4.1.  Results  from  WOE  scoring 
according  to  gender  and  school 

The  scoring  models  based  upon  WOE 
had  the  potential  to  perform  the  two  tasks  of 
identifying  separation  and  predictive  power. 
A benefit  of  the  scoring  analysis  was  the 
ability  to  visualize  the  data.  If  there  was 
separation  between  the  two  demographic 
curves,  we  could  have  determined  an 
optimal  score  threshold  and  tested  for 
accuracy  with  training  and  testing  data  as  in 
the  regression  analysis. 

To  investigate  the  predictive  power  of 
the  scoring  models,  Receiver  Operating 
Characteristic  (ROC)  curves  were  built  to 
show  the  tradeoff  between  sensitivity  and 
specificity  with  choosing  a particular  cutoff 
point.  For  instance,  one  may  choose  a 
cutoff  such  that  a high  percentage  of  males 
were  correctly  labeled  as  males,  but  in 
general  there  is  a tradeoff  associated  with 
having  an  increased  number  of  females  that 
are  incorrectly  labeled  as  males.  The  former 
is  the  true  positive  rate,  while  the  latter  is 
the  false  positive  rate,  and  a sample  ROC 
curve  is  shown  in  Fig.  4.2  for  gender. 


Ftg.  4.2.  ROC  Curve  for  gender 

An  ideal  ROC  curve  would  be  one  that 
included  the  point  (0,1)  indicating  it  was 
possible  to  achieve  a 100%  true  positive 
rate  with  a 0%  false  positive  rate.  Using  this 
logic,  curves  are  measured  by  the  area 
under  the  curve  (AUROC)  where  a value  of 
1 is  considered  the  best  while  0.5  is  the 
worst.  We  show  the  AUROC  score  for  the 
model  in  Table  4.1. 

We  developed  decision  rules  to  identify 
each  participant  and  record  the  number  of 
correct  identifications.  As  an  example,  we 
found  that  classifying  those  with  an 
engineering  score  above  1 .05  as  engineers 
and  below  as  non-engineers  yielded  a 78% 
success  rate.  To  further  discriminate,  we 
separated  the  non-engineers  based  upon 
the  commerce  score  threshold  of  0.92,  and 
subsequently  had  a total  success  rate 
based  on  major  of  66%. 


Table  4.1.  Performance  metrics  to  predict 
user  identity 

AUROC  % Correct 


Gender 

0.745 

67.6% 

Arts  & Sci 

0.718 

68.6% 

Engineering 

0.716 

77.9% 

Commerce 

0.810 

86.2% 

Total  Major 

N/A 

66.2% 

5 CONCLUSION 

Inverse  reinforcement  learning  has  the 
capability  to  quantify  human  decision 
making  through  observation.  This  machine 
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learning  method  can  be  used  in  many 
applications,  including  attribution.  However, 
the  literature  does  not  verify  that  IRL 
captures  real  decision  making.  IRL  has 
been  tested  to  heuristically  demonstrate  its 
merit  through  controlled  experimentation.  In 
this  work,  IRL  was  used  to  analyze  human 
behavior  in  experiments  where  the 
participants  were  not  given  any  instruction 
regarding  strategy.  The  most  difficult  aspect 
of  performing  IRL  is  developing  an  MDP 
that  can  capture  the  different  strategies  real 
participants  use  when  performing  a task. 
We  provided  a methodology  that  allows 
researchers  to  statistically  test  the  ability  of 
various  IRL  models  to  map  reward  functions 
to  actions  with  respect  to  some  application, 
in  this  case  attribution.  Models  were 
compared  based  upon  group  significance 
testing  and  predictive  power.  These 
statistical  methods  can  be  used  with  any 
IRL  scheme  to  test  their  usefulness  with 
respect  to  attribution. 

Without  IRL,  it  is  very  difficult  to 
understand  the  strategy  that  each 
participant  used  to  perform  shopping.  At  the 
most,  the  other  study  could  only  analyze  the 
relative  frequencies  of  the  number  of  times 
each  page  type  was  visited,  and  would  lose 
any  information  on  the  order  that  the 
participant  viewed  pages.  People  choose 
the  next  page  as  a direct  result  of  the  page 
they  are  currently  viewing  and  overall 
preferences  of  the  final  goal  and  the 
required  steps  to  achieve  satisfaction.  Most 
work  on  analyzing  differences  in  humans 
choose  to  test  the  change  in  an  observable 
variable,  and  it  is  rare  to  see  analysis  on  the 
mathematical  formulation  of  strategy. 

The  next  step  in  assessing  IRL  as  it 
pertains  to  capturing  decision  making  is  to 
analyze  individual  consistency.  This  work 
focused  on  analyzing  differences  between 
groups,  whereas  consistency  analysis 
would  investigate  similarities  of  an  individual 
over  time.  The  primary  goal  of  consistency 


analysis  would  be  to  show  that  an  individual 
has  an  underlying  strategy  to  perform  tasks, 
and  although  actions  may  appear  to  be 
different  across  trials  where  the  individual  Is 
placed  in  new  situations,  the  strategy 
captured  by  the  reward  function  would 
remain  constant.  This  would  serve  to 
demonstrate  that  the  user  has  a reward 
function  and  that  IRL  could  recover  the 
correct  one.  Users  would  need  to  be 
observed  performing  the  same  task  multiple 
times,  which  would  require  additional  testing 
than  the  data  gathered  for  this  experiment. 
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Markov  Decision  Processes 


•S  is  set  of  all  possible  states  at  time  t,  given  by 
•A  is  the  set  of  all  possible  actions,  given  your  in 
state 

♦T  is  the  Transition  Probabilities,  given  your  in 
and  chose  a^ 

•y  G [0,1)  is  a discount  factor 
♦D  is  the  initial-state  distribution 
•R  is  the  Transition  reward  gained  from  taking 
action  a^  at 


Inverse  Reinforcement  Learning  (IRL) 


1. e2  g4 

2. dlf3 

3. f3  h5 
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Defining  Reward  function 


What  move  to  make? 


R(s,a)  = ZWi*4), 


What  to  eat? 


R(s,a)=  W,*(M-(S,+1)) 


Selecting  Optimal  Reward  Function 


Linear  Programming 

•Only  works  on  linear  reward  functions 
•Computationally  efficient 


Discretizing  the  space  & 
perform  exhaustive  search 

♦Conceptuallyeasy 
•More  robust 

•Computationally  expensive 


http:// people  richland  edu/james/lect 
ure/  mll6/  system  s/ 1 i near,  png 


R(s,a)i=  0.9*  <pj+0.1*  4)2 
R(s,a>2=  0.8*  4)j  +0.2*  4)^ 
R(s,a>3=  0.7*  4>j  +0.3*  4)^ 
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Comparing  Different  Reward  Functions 
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Predicting  Identity 
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Experimental  Setup 


Goal : To  determine  attributes 
ofan  individual shoppingon 
walmart.com 

Procedure:  Participant  is  given 
4 portfolio  of  people  to  shop 
for  and  5 mins  per  person  to 
complete  the  shopping 

Data  Collected:  The  sequence 
of  different  types  of  pages 
viewed 

Number  of  Participants  :30 
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Experiment's  Markov  Decision  Process 

state  - The  current  type  of  page  you  are  viewing 
(e.g.  store  department  page,  item  list,  and 
checkout  page) 

Action  - The  next  type  of  page  selected 
Transition  - The  transition  probability  is  100% 
Reward  Function  - R(s,a)  = W *(M-(Sa+l)) 
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Performance  metrics  to  predict  user  identity 


Fatse  Positive  Rate 

ROC  Curve  for  gender 


AUROC  % Correct 


Gender 

0.745 

67.6% 

Arts  & Sci 

0.718 

68.6% 

Engineering 

0.716 

77.9% 

Commerce 

0.810 

86.2% 

Total  Major 

N/A 

66.2% 
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Conclusions 


Inverse  Reinforcement  Learning  can  be  set  up  in  many 
different  ways 

Machine  Learning  methods  can  be  applied  to 
attribution 

The  statistical  techniques  presented  area  good  way 
to  harness  the  predictive  power  of  Inverse 
Reinforcement  Learning 


Next  Steps 

To  determine  consistency  of  an  individual's 
reward  function 

To  examine  Inverse  Reinforcement  Learning  for 
training  purposes 
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Any  Questions? 
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5.17  Social-Cognitive  Biases  In  Simulated  Airline  Luggage  Screening 


SOCIAL-COGNITIVE  BIASES  IN  SIMULATED  AIRLINE 

LUGGAGE  SCREENING 

Jeremy  R.  Brown  & Poornima  Madhavan 
Old  Dominion  University 
jrbrown@odu.edu  pmadhava@odu.edu 

Abstract  This  study  illustrated  how  social-cognitive  biases  affect  the  decision  making  process  of  airline  luggage  screeners. 
Participants  (n  = 96)  performed  a computer  simulated  task  to  detect  hidden  weapons  in  200  x-ray  images  of  passenger  luggage. 
Participants  saw  each  image  for  two  (high  time  pressure)  or  six  seconds  (low  time  pressure).  Participants  observed  pictures  of  the 
“passenger"  who  owns  the  luggage.  The  "pre-anchor  group"  answered  questions  about  the  passenger  before  the  luggage  image 
appeared,  the  “post-anchor'"  group  answered  questions  after  the  luggage  appeared,  and  the  “no -anchor  group"  answered  no 
questions.  Participants  either  stopped  or  did  not  stop  the  bag,  and  rated  their  confidence  in  their  decision.  Participants  under  high 
time  pressure  had  lower  hit  rates  and  higher  false  alarms.  Significant  differences  between  the  pre-,  no-,  and  post-anchor  groups 
were  based  on  the  gender  and  race  of  the  passengers.  Participants  had  higher  false  alarm  rates  in  response  to  male  than  female 
passengers. 


1.0  Visual  Search  Tasks 

The  primary  goal  of  visual  search  tasks 
is  to  effectively  differentiate  critical 
signal  stimuli  from  irrelevant  non-signals 
(known  as  distractors).  There  have 
been  various  studies  looking  Into 
different  aspects  of  visual  search  tasks. 
Many  of  the  visual  search  studies  focus 
on  visual  clutter  and  its  effects  on  the 
search  task  [1.  2.  3.&  4].  Another  factor 
that  affects  visual  search  is  age  [1 , 2]. 
Visual  clutter  is  typically  caused  by 
“distractors”.  Studies  by  Grahame 
Laberge,  and  Scialfa  (2004)  [1]  and 
McPhee,  Scialfa,  Dennis,  Ho,  and  Caird 
(2004)  [2]  found  that  as  clutter  is 
increased,  the  time  it  takes  to  detect  the 
target  also  increased.  They  found  that 
the  task  increased  in  perceived  difficulty 
as  a consequence  of  increased  clutter. 
This  is  because  it  is  harder  to  recognize 
an  object  as  the  clutter  increases  [5]. 

As  there  are  more  objects  to  search 
through  to  find  a target,  the  search  will 
take  longer  and  will  be  less  efficient.  In 
some  instances,  however,  detection 
time  can  decrease  with  clutter, 
especially  when  the  clutter  causing 
objects  are  of  a larger  size  than  the 
target  [6].  This  is  due  to  attention  being 
drawn  to  the  “empty"  space  between  the 
clutter  causing  objects. 


clutter  consists  of  and  its  physical 
similarity  or  dissimilarity  to  the  target. 
The  more  similar  the  distractor  is  to  the 
target,  in  terms  of  color,  brightness,  and 
orientation,  the  more  difficult  it  is  to  find 
the  target  [3].  Target  objects  that  have 
multiple  colors  or  textures  are  harder  to 
detect  in  a cluttered  environment, 
especially  when  the  clutter  is  of  a similar 
color  or  texture  to  that  of  the  target  [7]. 

The  reason  visual  search  tasks  are  the 
focus  of  several  researchers  is  that 
there  are  several  jobs  in  the  real  world 
that  use  visual  search  as  the  main 
component  of  the  work  such  as  airport 
luggage  screening.  The  primary  task  for 
airline  luggage  screening  requires  the 
screenerto  search  through  an  x-ray 
image  and  detect  a particular  dangerous 
target  from  the  clutter  of  n on-lethal 
objects.  On  one  level,  luggage 
screening  is  a simple  signal  detection 
task  where  the  screener  must 
differentiate  critical  signals  (or,  threat 
objects)  from  background  noise. 
However,  the  detection  task  is 
complicated  by  the  fact  that  on  several 
occasions,  the  threat  object  must  be 
detected  within  an  initial  glimpse  of  the 
x-ray  image,  spanning  just  a few 
seconds. 


In  addition  to  the  amount  of  clutter, 
search  efficiency  is  affected  by  what  the 
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Airport  luggage  screening  is  further 
complicated  by  the  number  and  diversity 


of  threat  objects  that  might  potentially  be 
embedded  in  a piece  of  luggage.  All  the 
studies  described  above  have  effectively 
addressed  the  cognitive  aspects  of 
visual  search  in  luggage  screening  at 
the  level  of  the  individual.  How/ever,  no 
study  so  far  has  attempted  to  address 
extraneous  issues  (social,  cultural, 
environmental)  that  might  potentially 
influence  screening  efficiency  over  and 
beyond  those  that  extend  beyond  simple 
visual  search  processes. 

The  purpose  of  this  study  was  to 
examine  what  effect,  If  any,  variables 
such  as  race,  age  and  gender  of  the 
passenger  have  on  the  screener’s 
decisions  to  stop  the  passenger’s 
luggage  or  not.  Computer  simulation 
was  used  instead  of  observing  actual 
luggage  screeners  so  that  the  study 
could  be  more  controlled  than  would  be 
possible  In  the  real  environment. 
Simulation  also  allowed  the  study  to  be 
run  using  the  same  luggage  images  for 
several  student  “screeners”  allowing 
comparison  between  different  screeners 
and  luggage  images. 

1.1  Social  -Cognitive  Biases 

1.1.1  Age 

Age  bias  is  a social  bias  related  to  a 
person’s  age  that  can  have  an  effect  on 
decision  making.  Older  people  often 
tend  to  be  discriminated  against  for  jobs. 
Specifically,  the  belief  is  that  older 
individuals  are  not  as  flexible  In  their 
thinking  as  younger  individuals. 
Therefore  a job  that  requires  flexibility 
would  not  be  a good  fit  for  an  older 
worker,  [8]  whereas  younger  people  are 
believed  to  have  more  potential  for 
development  than  the  older  people  [9]. 
Based  on  this,  younger  people  may  be 
more  likely  to  be  employed  as  airport 
luggage  screeners,  as  their  thinking 
must  be  very  flexible  to  figure  out  what 
constitutes  a target. 


1.1.2  Gender 

When  one  gender  is  given  preferential 
treatment  over  the  other,  it  is  typically 
referred  to  as  “gender  bias”  [10]. 

Gender  bias  is  pervasive  especially  in 
the  workplace.  When  men  and  women 
are  evaluated  for  the  same  type  of  work 
male  workers  are  often  found  to  get 
better  rewards  for  good  evaluations 
compared  to  female  workers;  on  the  flip 
side,  males  also  receive  harsher 
punishments  than  females  in  response 
to  poor  evaluations  [11].  Research  has 
revealed  that  performance  ratings  are 
more  strongly  related  to  promotions  for 
female  workers  compared  to  male 
workers,  which  suggests  that  females 
are  held  to  higher  standards  than  males 
[12].  For  example,  in  one  study  wherelr 
men  and  women  were  fired  from  similar 
jobs,  men  received  more  compensation 
than  women  [13]. 

Clearly,  gender-related  biases  play  a 
major  role  when  decisions  to  hire, 
promote  or  fire  are  made  in  several  Job 
contexts. 

1.1.3  Race 

Though  we  would  like  to  think 
differently,  racial  bias  is  still  prevalent 
throughout  the  world.  There  have  been 
numerous  studies  looking  at  racial  bias 
among  police  and  their  decision  to  shoo 
or  not  shoot  [14].  In  the  Correll  et  al. 
(2007)  [14]  study,  comparing  police  to 
civilians  in  the  same  district,  civilians 
were  found  to  be  more  likely  to  shoot 
when  shown  a minority  suspect 
compared  with  the  police.  Both  police 
and  civilian  participants  took  longer  to 
react  when  the  White  suspect  had  a 
gun,  and  the  minority  suspect  did  not 
have  a gun.  The  researcher  concluded 
that  seeing  a white  person  with  a gun 
violated  people’s  expectations  leading 
them  to  take  longer  to  react;  the 
opposite  was  true  when  observing  a 
person  of  minority  race  who  was 
perceived  as  dangerous  even  without  a 
weapon  [14],  The  police  officers  and 
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civilians  were  White,  Black,  Native 
American,  and  Hispanic  so  there  was  a 
mix  of  races. 

2.0  The  Luggage  Screening  Study 

This  study  was  designed  to  examine 
whether  the  social-cognitive  biases 
described  above  would  influence  the 
decision  making  process  in  an  airport 
security  screening  context.  What  makes 
this  study  unique  is  the  focus  on  social- 
cognitive  biases  which  differs  from 
existing  studies  that  have  focused  on 
either  the  luggage  screening  process 
[15,  & 16]  or  on  the  decision  making 
made  by  the  luggage  screener  [17, 18, 
19].  This  study  was  designed  to 
examine  whether  these  biases  will 
influence  active  decision  making  during 
the  luggage  screening  process.  As 
mentioned  earlier,  we  implemented  a 
laboratory-based  experimental  task 
along  with  a luggage  screening 
simulation  to  study  this. 

3.0  Method 

3.1  Participants 

Participants  were  96  Old  Dominion 
University  undergraduates  completing 
the  study  for  class  credit.  The  study 
took  approximately  1 hour  to  complete, 
for  1 hour  of  research  credit. 

3.2  Materials 

Gateway  computers  were  used,  which 
were  running  Microsoft  XP  with  service 
pack  2.  These  computers  were  used  to 
run  a computer  simulation  of  airline 
luggage  screening  created  by  E-prime 
2.0. 

3.3  Procedure 

Participants  were  randomly  assigned  to 
a control  group,  (n=24),  and  three 
experimental  groups  (n=  72)  in  a 2 (time 
pressure:  high  vs.  low)  X 3 (anchor:  pre- 
anchor, no-anchor,  post-anchor)  design. 
Participants  filled  out  an  entrance 
questionnaire  prior  to  running  the  study. 
The  task  was  for  participants  to  detect 


the  presence  of  dangerous  objects  in  x- 
ray  images  of  passenger  luggage. 
Participants  scanned  200  images 
distributed  into  two  blocks  of  100 
images  each.  At  the  beginning  of  each 
block,  participants  were  shown  the  five 
targets  that  they  needed  to  look  for  in 
the  1 00  bags  that  were  to  follow.  For  the 
experimental  groups,  the  appearance  of 
the  luggage  image  on  each  trial  was 
preceded  by  the  picture  of  a random 
passenger  (drawn  from  a new  set  of 
100,  that  includes  the  following  races; 
White,  Black,  Asian,  Middle  Eastern, 
and  Hispanic)  to  whom  the  bag 
supposedly  “belongs”.  For  each  of  the 
experimental  groups,  half  the 
participants  performed  the  task  under 
high  time  pressure  (2  seconds  for  each 
luggage  image  exposure)  and  the  other 
half  performed  under  low  time  pressure 
(6  seconds  for  each  luggage  image 
exposure).  After  deciding  whether  to 
pass  the  bag  or  not,  participants  rated 
their  confidence  in  their  decision  on  a 
five  point  scale. 

Participants  in  the  pre-anchor  group  (n  = 
24)  were  first  required  to  answer  two 
statements  about  the  passenger  before 
the  x-ray  image  appears.  After 
answering  the  statements,  they  clicked 
“next"  and  the  x-ray  image  was  brought 
up  onto  the  screen,  after  which,  they 
rated  their  confidence  on  their  decision 
of  whether  or  not  to  pass  the  bag.  The 
two  statements  that  were  used  were 
statement  #1 : "I  think  this  person  is 
attractive"  and  statement  #2  '7  will  most 
likely  stop  this  person’s  luggage. " These 
two  statements  appear  to  be  the  most 
powerful  indicators  of  the  existence  of 
such  cognitive  biases. 

For  participants  in  the  no-anchor  group 
(n  = 24),  after  4 seconds  of  the 
passenger  appearing  the  x-ray  image  of 
a bag  appeared  beside  the  passenger. 
These  participants  were  not  required  to 
answer  any  questions  about  the 
passengers,  but  they  still  rated  their 
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confidence  on  their  decision  to  pass  or 
not  pass  the  luggage. 

For  the  post-anchor  group  (n  = 24),  the 
program  ran  the  same  experimental 
procedure  as  for  the  no-anchor  group. 
However,  participants  were  required  to 
answer  the  two  statements  answered  by 
the  pre-anchor  group  about  each 
passenger  after  the  participant  has 
chosen  whether  or  not  to  pass  the  bag 
and  rated  their  confidence  in  that 
decision.  Once  they  have  answered  the 
questions  and  clicked  “next”,  the  next 
picture  of  a passenger  appeared.  This 
procedure  continued  until  the  end  of  the 
trial  block, 

A control  group  (n  = 24)  performed  the 
screening  task  alone  without  observing 
the  pictures  of  passengers.  Of  these  24 
participants,  12  participants  performed 
under  high  time  pressure  and  the  other 
12  performed  under  low  time  pressure. 
This  group  served  as  a baseline  for 
performance  under  each  level  of  time 
pressure  without  the  additional 
anchoring  information  provided  by  the 
passengers’  pictures. 

The  base  rate  for  the  targets  was  50% 
for  all  groups.  Participants  were  not 
informed  about  the  base  rate.  At  the 
end  of  each  trial,  participants  received 
feedback  in  the  form  of  a text  message, 
telling  them  whether  they  made  a 
correct  decision  or  not.  Also  they 
received  a cumulative  percent  correct 
score  shown  after  each  decision  to  pass 
or  not  pass  the  bag.  At  the  end  of  the 
experiment,  participants  filled  out  a final 
“task  knowledge"  questionnaire.  The 
participant  with  the  highest  score  for 
their  experiment  session  received  a 
piece  of  candy  as  a prize. 

4.0  Data  Analysis 

The  data  was  analyzed  for  normality.  If 
normality  is  violated,  box  plots  were 
used  to  examine  which  sections  of  the 
data  were  outliers,  and  the  outliers  were 


brought  to  2 standard  deviations  away 
from  the  mean.  A 2 (time  pressure:  high 
vs.  low)  X 3 (anchor:  pre-anchor,  post- 
anchor,  no-anchor)  X 5 (passenger 
race:  White,  Black,  Asian,  Middle 
Eastern,  Hispanic)  X2  (passenger 
gender:  male  vs.  female)  mixed 
measures  AN  OVA  was  ran  for  each 
dependent  variable.  For  the  interactions 
that  were  significant,  a mixed  measures 
A NOVA  was  run,  followed  by  paired  t- 
tests  with  a bonferroni  correction. 

The  dependent  variables  of  interest 
were: 

• Hit  rate  - the  probability  of  correctly 
detecting  a target. 

• False  alarm  rate  - probability  of  an 
incorrect  detection  when  there  was  no 
target 

• Sensitivity  (d’)  - the  perceptual  ability 
to  differentiate  between  a target  and 
non-target. 

• Response  criterion  setting  (c)  — the 
propensity  to  generate  “yes”  or  “no” 
responses. 

The  data  analytic  strategy  was  based  on 
a two-pronged  approach.  We  used  hit 
rate  and  false  alarm  rate  as  pure 
performance  measures  which  directly 
measure  a participant’s  performance  on 
the  task.  In  addition,  we  used  the  signal 
detection  variables  of  sensitivity  and 
response  criterion  setting  to  understand 
the  decision  making  processes  that 
drive  performance  (resulting  in  hit  and 
false  alarms). 

5.0  Results 

Due  to  the  complexity  of  the 
experimental  design,  the  study  was 
broken  up  into  two  different  sets  of 
variables.  Hit  rate  and  false  alarm  rate 
are  grouped  under  “performance 
analysis",  and  sensitivity  and  response 
criterion  setting  are  grouped  under 
“signal  detection  analysis”. 
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5.1  Performance  Analysis 

5.1.1  Hit  Rate 

All  “p"  values  below  .05  are  statistically 
significant.  The  hit  rate  data  was 
normally  distributed  with  no  outliers, 
therefore  no  data  cleaning  was 
necessary. 

A 2 (time  pressure:  high  vs.  low)  X 3 
(anchor:  pre-anchor,  post-anchor,  no- 
anchor) X 5 (passenger  race:  White, 
Black,  Asian,  Middle  Eastern,  Hispanic) 
X 2 (passenger  gender:  male  vs. 
female)  mixed  measures  ANOVA  was 
used  to  analyze  the  hit-rate  data.  The 
mixed  measures  ANOVA  revealed  that 
there  was  a significant  main  effect  for 
time  pressure  (F(1,66)  = 56.18,  p < 

.001,  r|^  = .46).  Participants  under  low 
time  pressure  had  higher  hit  rates  (M  = 
.82,  SE  = .01)  than  the  participants 
under  high  time  pressure  (M  = .69,  SE  = 
.01).  All  other  main  effects  and 
interactions  were  statistically  non- 
significant. 

5.1.2  False  Alarm  Rate 

The  data  set  was  not  normally 
distributed,  and  the  box  plots  revealed 
12  outliers,  which  were  brought  in  to 
within  2 standard  deviations  of  the 
mean.  This  made  the  data  set  normally 
distributed.  A 2 (time  pressure:  high  vs. 
low)  X 3 (anchor:  pre-anchor,  post- 
anchor. no-anchor)  X 5 (passenger 
race:  White,  Black,  Asian,  Middle 
Eastern,  Hispanic)  X 2 (passenger 
gender;  male  vs.  female)  mixed 
measures  ANOVA,  similar  to  that  used 
for  the  Hit  Rate  analysis,  was  used  to 
analyze  the  False  Alarm  Rate  data.  The 
results  of  the  ANOVA  revealed  that 
there  was  a significant  main  effect  for 
passenger  gender  (F(1,  66)  = 7.81,  p = 
.007,  = .11),  and  time  pressure  (F(1, 

66)=  10.80,  p=  .002,  r)2  = .14). 
Participants  had  a significantly  higher 
false  alarm  rate  for  male  passengers  (M 
= .16,  SE  = .01)  than  they  did  for  the 
female  passengers  (M  = .13,  SE  = .01). 


Participants  under  high  time  pressure 
(M  = . 1 9,  SE  = .02)  had  significantly 
more  false  alarms  than  did  the 
participants  under  low  time  pressure  {M 
= ,11,  SE  = .02).  All  other  main  effects 
and  interactions  were  statistically  non- 
significant. 

5.2  Signal  Detection  Analysis 
5.2.1  Sensitivity:  d* 

Sensitivity,  also  known  as 
discriminability  index,  is  a measure  of 
how  far  apart  the  signal  and  noise 
curves  are  for  an  individual  (Heeger,  D., 
1997).  In  other  words,  this  implies  that 
the  more  the  signal  (or,  target)  stands 
out  from  back  ground  clutter,  the  easier 
it  will  be  for  the  human  to  locate  the 
target.  So,  in  this  experiment,  higher 
sensitivity  implies  that  it  was  easier  for 
the  participant  to  distinguish  the  target 
from  non-targets.  Specifically,  the 
higher  the  sensitivity,  the  better  was  the 
detection  performance. 

The  sensitivity  data  was  normally 
distributed  with  no  outliers,  therefore  no 
data  cleaning  was  necessary.  A 2 (time 
pressure:  high  vs.  low)  X 3 (anchor;  pre- 
anchor, post-anchor,  no-anchor)  X 5 
(passenger  race;  White.  Black,  Asian, 
Middle  Eastern.  Hispanic)  X 2 
(passenger  gender:  male  vs.  female) 
mixed  measures  ANOVA  was  used  to 
examine  the  data  obtained  for 
sensitivity.  The  main  effect  of  time 
pressure  (F(1 , 66)  = 47.34,  p < .001,  ri^ 
= .418)  and  the  interaction  between 
passenger  gender,  passenger  race,  and 
anchor  (F(8,  264)  = 3.34,  p = .001,  = 

.092)  were  both  significant.  Under  low 
time  pressure  {M  = 2.23,  SE  = .07) 
participants  had  higher  sensitivity  than 
did  participants  under  the  high  time 
pressure  (M  = 1.54,  SE  = .07). 

To  further  analyze  the  relationship 
between  passenger  gender  and 
passenger  race  within  each  anchor 
group,  a 2 (gender)  X 5 (race)  mixed 
measures  ANOVA  was  run  within  each 
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of  the  anchor  groups  and  is  described  in 
the  following  sections. 

5.2.1. 1 Pre-anchor 

All  of  the  main  effects  were  non- 
significant which  include  the  following: 
passenger  gender  and  passenger  race. 
The  interaction  between  passenger 
gender  and  passenger  race  was 
significant  (F(4,  92)  = 2.863,  p = .028, 

= .102).  Sphericity  was  violated,  and  by 
using  the  Greenhouse-Giesser  (p=.063), 
Huynh-Feldt  (p=.057),  and  the  Lower 
Bound  (p=.  104)  correction  the 
interaction  became  statistically  non- 
significant. All  of  the  other  interactions 
were  found  to  be  non-significant  which 
include  the  following:  passenger  gender 
by  time  pressure,  passenger  race  by 
time  pressure,  and  passenger  gender  by 
passenger  race  by  time  pressure. 

5.2.1 .2  No-anchor 

All  of  the  main  effects  were  non- 
significant which  include  the  following: 
passenger  gender  and  passenger  race. 
The  only  interaction  that  was  found  to 
be  significant  was  the  interaction 
between  passenger  gender  and 
passenger  race  (F(4,  92)  = 2.621,  p = 
.04,  = .102).  Sphericity  was  violated 

and  by  using  the  Greenhouse-Giesser 
(p=.048),  and  Huynh-Feldt  (p=.04) 
correction  the  interaction  was  still 
statistically  significant.  However,  using 
the  Lower  bound  (p=.119)  correction 
rendered  the  interaction  statistically  non- 
significant. All  of  the  other  interactions 
were  found  to  be  non-significant 
including  the  following;  passenger 
gender  by  time  pressure,  passenger 
race  by  time  pressure,  and  passenger 
gender  by  passenger  race  by  time 
pressure. 

The  only  statistically  significant 
difference  between  male  and  female 
passengers  was  between  the  White 
passengers;  participants  had  higher 
sensitivity  for  detecting  targets  when  the 


passengers  were  male  compared  to 
female  (male:  M=  1.87,  S£=  .17; 
female:  M=  1.45,  S£=  .16;  f=  2.786,  p 
= .011). 

5.2. 1.3  Post-anchor 

All  of  the  main  effects  and  interactions 

were  non-significant. 

5.2.2  Response  Criterion  Setting: 
c 

Response  Criterion  Setting  is  the 
propensity  to  generate  “yes”  or  “no” 
responses.  This  means  that  the  human 
sets  an  arbitrary  threshold  or  “cutoff 
point”  for  responding;  when  the  signal  to 
noise  ratio  is  perceived  as  being  above 
this  level,  the  participant  will  indicate  a 
target  is  present.  Likewise,  if  the  ratio  is 
perceived  as  being  below  this  threshold, 
they  will  indicate  that  a target  is  not 
present  (Heeger,  D.,  1997).  Typically,  if 
the  participant  sets  his/her  response 
criterion  high  such  that  the  criterion 
setting  is  high  or  positive,  responding  is 
said  to  be  conservative.  This  means 
than  the  participant  has  a propensity  to 
say  “no"  more  often  than  “yes".  The 
opposite  occurs  when  a participant  sets 
his/her  response  criterion  low.  In  such 
cases,  responding  is  said  to  be  more 
liberal;  this  will  result  in  low  or  negative 
criterion  settings  and  a general 
tendency  to  say  “yes"  more  frequently 
than  “no". 

The  data  set  was  not  normally 
distributed,  and  the  box  plots  revealed 
12  outliers,  which  were  brought  in  to 
within  2 standard  deviations  of  the 
mean.  This  made  the  data  set  normally 
distributed.  A 2 (time  pressure:  high  vs. 
low)  X 3 (anchor:  pre-anchor,  post- 
anchor, no-anchor)  X 5 (passenger 
race:  White,  Black,  Asian,  Middle 
Eastern,  Hispanic)  X 2 (passenger 
gender:  male  vs.  female)  mixed 
measures  ANOVA  was  used  to  analyze 
the  response  criterion  setting  data.  The 
ANOVA  indicated  a significant  main 
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effect  of  passenger  race  (F{4,  264)  = 
8,48,  p < .001,  = .114)  and  an 

interaction  between  passenger  gender, 
time  pressure,  and  anchor  (F{2,  66)  = 
3.50,  p = .036,  rf  = .096).  Participants 
had  significantly  more  conservative 
response  criteria  for  passengers  of 
Hispanic  race  {/W=  1.19,  SF=  .09) 
compared  to  all  the  other  races  (White 
M = .85,  SE  = .05,  / = 3.97,  p < .001 ; 
Black  M = .83,  SE  = .05,  t = 4.33,  p < 
.001 ; Asian  M = .82,  SE  = .05,  / = 4.35, 
p < .001;  Middle  Eastern  M = .92,  SE  = 
.06,  f = 3. 14,  p = .002).  This  means  that 
participants  were  less  likely  to  say  there 
was  a target  present  when  confronted 
with  a Hispanic  passenger  relative  to 
passengers  of  other  races. 

To  further  examine  criterion  settings 
within  anchor  groups,  a 2 (gender)  X 2 
(time  pressure)  mixed  measures 
A NOVA  was  run  within  each  of  the 
anchor  groups  described  below. 

5.2.2.1  Pre-anchor  and  Post- 
anchor 

All  of  the  following  main  effects  were 
non-significant:  passenger  gender  and 
time  pressure.  The  interaction  between 
passenger  gender  and  time  pressure 
was  found  to  be  non-significant  as  well. 

5.2.2.2  No-anchor 

For  this  group  all  the  main  effects  were 
non-significant  which  include  passenger 
gender  and  time  pressure.  The 
interaction  between  passenger  gender 
and  time  pressure  (F(1 , 22)  = 8.391 , p = 
.008,  r|^  = .276)  was  found  to  be 
statistically  significant.  One  tailed  t tests 
were  used  for  post  hoc  analysis  of  the 
interaction.  The  t-tests  revealed  that 
there  was  a non-significant  difference 
for  participants’  response  criterion 
setting  for  the  male  versus  female 
passengers  under  low  time  pressure. 
However,  under  high  time  pressure 
criterion  setting  for  male  passengers  {M 
= 1.14,  SE  = .71)  was  significantly 


higher  than  for  female  passengers  (M  = 
.84,  se=  .70,  f=2.18,p=.036). 

6.0  Discussion 

Most  luggage  screening  studies  to  date 
have  focused  on  either  mechanics  of  the 
luggage  screening  process  [15,  16]  or 
on  the  decision  making  of  luggage 
screeners  [17, 18,  & 19].  What  has 
seldom  been  addressed  in  these 
studies,  in  particular  the  decision 
making  studies,  is  a consideration  of 
extraneous  factors,  namely  social- 
cognitive  variables,  that  can  affect  the 
decision  making  process.  One  of  these 
factors  is  the  passenger  himself/herself, 
and  any  biases  the  screener  may  have 
towards  the  passengers.  The  purpose 
of  this  study  was  to  examine  whether 
such  social-cognitive  biases  as  gender 
bias,  and  racial  bias  would  influence 
decision  making  during  the  luggage 
screening  process.  We  were  also 
interested  in  examining  the  role  of  time 
pressure,  and  if  the  screening  process 
would  be  affected  by  decision  heuristics 
such  as  anchoring. 

6.1  Role  of  anchoring 

While  time  pressure  played  a significant 
role  in  the  results,  we  found  that 
anchoring  also  played  a significant  role 
In  Impacting  decision  making. 

Anchoring  is  the  tendency  for  decision 
makers  to  focus  on  one  particular  piece 
of  information  and  use  that  to  base 
subsequent  decisions  [20],  The 
anchoring  heuristic  works  by  giving 
people  a reference  point  to  help  them 
make  a decision.  For  example,  in  an 
early  experiment  on  anchoring,  when 
asked  a question,  “is  the  percentage  of 
African  countries  in  the  United  Nations 
greater  than  or  less  than  a 25  percent?” 
[20]  Participants  generally  used  the  “25 
percent”  to  base  their  judgment  of 
exactly  what  percentage  of  African 
countries  is  in  the  United  Nations.  This 
worked  even  when  the  percentage  was 
randomly  selected  in  front  of  the 
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participant.  In  general,  if  an  anchor  is 
present,  the  anchor  can  influence  the 
decision  making  process  of  a 
participant,  and  therefore  influence 
overall  performance. 

In  this  study,  the  “anchor”  was  a series 
of  questions  drawing  attention  to  the 
passenger  to  whom  the  luggage 
belonged.  Results  revealed  that  when 
participants  had  the  anchor,  either 
before  (pre-anchor)  or  after  (post- 
anchor) they  saw  the  luggage  image,  it 
appeared  to  suppress  rather  than 
enhance  the  social-cognitive  biases, 
relative  to  the  participants  in  the  no- 
anchor group.  The  results  also  revealed 
significant  interactions  between 
cognitive  anchoring  and  race  and 
gender  of  passengers  on  performance. 
Contrary  to  our  initial  expectations  and 
hypothesis,  this  anchoring  effect  was 
particularly  salient  when  time  pressure 
was  low  and  participants  had  more  time 
to  ‘attend  to'  the  passengers. 

The  results  suggest  that  participants 
used  their  personal  biases  as  ‘anchors' 
to  help  in  the  decision  making  process, 
particularly  when  they  had  time  to  pay 
more  attention  to  passengers. 

Research  has  revealed  that  minority 
races,  such  as  Hispanics,  have  been 
associated  with  negative  behavioral 
connotations.  For  instance  studies  of 
police  officers  and  their  decisions  to 
shoot  or  not  shoot  [14],  have 
demonstrated  that  police  were  more 
likely  to  shoot  suspects  of  minority  races 
even  when  they  did  not  have  a gun. 

The  higher  hit  rate  associated  with  the 
Hispanic  male  passengers  in  our  study 
could  possibly  be  due  to  the  interaction 
of  these  social-cognitive  biases.  Based 
on  the  surmise  that  the  participant 
already  had  a negative  association  with 
male  members  of  minority  races,  it  is 
possible  that  they  were  more  suspicious 
of  the  two  passenger  categories  (men 
and  minority  races)  during  the  luggage 
screening  process.  Therefore,  when 


searching  through  the  x-ray  image,  they 
perhaps  used  gender  and  race  as 
decision  heuristics,  paid  more  attention 
to  the  items  in  bags  that  were 
accompanied  by  male  passengers  of 
Hispanic  race,  and  detected  the  targets 
more  accurately  when  they  were  indeed 
present.  This  actually  suggests  a 
potential  benefit  of  social -cognitive 
biases  in  this  instance!  However,  it 
must  be  noted  that  this  effect  was  only 
observed  under  conditions  of  low  time 
pressure  when  there  was  ample  time  to 
attend  to  the  bags. 

The  existence  of  social-cognitive  biases 
in  detection  behavior  is  supported,  albeit 
in  a slightly  different  manner,  by  the 
false  alarm  analysis  as  well.  Similar  to 
the  effects  found  in  the  hit  rate  data, 
male  Hispanic  passengers  had  a higher 
false  alarm  rate  associated  with  them 
than  female  Hispanic  passengers. 
Interestingly,  the  false  alarm  effect  was 
found  under  conditions  of  high  time 
pressure  rather  than  low  time  pressure. 
This  indicates  the  negative  effects  of 
social-cognitive  biases.  Although  target 
detection  was  benefited  to  an  extent  due 
to  anchoring  under  low  time  pressure, 
high  time  pressure  led  to  negative 
effects  in  the  form  of  higher  false 
alarms. 

Similar  effects  for  racial  bias  were  found 
in  participants’  criterion  settings  wherein 
participants  had  a more  conservative 
response  criterion  setting  for  certain 
passenger  races.  This  means  that 
participants  were  more  conservative  and 
had  to  have  a higher  subjective 
evidence  of  a target  being  present 
before  they  would  indicate  that  one  was 
present.  This  is  very  interesting  since 
we  have  already  seen  in  the  false  alarm 
rate  data  that  participants  also  had  a 
higher  false  alarm  rate  for  the  male 
Hispanic  passengers  compared  to  the 
other  races  of  passengers.  At  first 
glance  the  conservative  criterion  setting 
for  Hispanic  passengers  appears  to 


789 


contradict  the  finding  that  participants 
stopped  luggage  more  (i.e.,  said  “target 
present”  more)  in  response  to  these 
passengers.  Is  it  possible  that 
participants’  lower  response  criterion  for 
the  female  Hispanic  passengers  relative 
to  male  Hispanic  passengers  has  raised 
the  criterion  setting  for  the  Hispanic 
passengers  overall,  although  this  is  not 
evident  in  a statistically  significant 
difference  between  the  male  and  female 
Hispanic  passengers  per  se.  As 
hypothesized,  the  participants  had 
higher  false  alarm  rates  for  minority 
passengers  than  they  did  for  the  White 
passengers. 

As  hypothesized,  participants  had  a 
higher  false  alarm  rate  when  the 
passenger  was  male  which  would  lead 
them  to  being  stopped  more.  Also  the 
interaction  between  passenger  gender 
and  time  pressure  for  the  no-anchor 
group  was  an  interesting  indication  of 
how  not  providing  an  anchor 
significantly  impacted  performance  more 
than  providing  anchors  in  this  study. 
When  time  pressure  was  low, 
participants  had  a more  liberal  response 
to  the  male  passengers  thereby 
stopping  the  luggage  belonging  to  male 
passenger  more  often.  Conversely, 
participants  had  a more  conservative 
response  towards  the  female 
passengers,  thereby  stopping  their 
luggage  with  lower  frequency  than  for 
male  passengers.  Surprisingly,  the 
opposite  became  true  under  high  time 
pressure;  participants  had  a higher, 
more  conservative  response  to  the  male 
passengers,  while  they  had  a more 
liberal  response  to  the  female 
passengers.  It  is  possible  that  when 
participants  had  time  to  think  about  the 
passenger  and  the  luggage,  as  in  the 
case  of  the  low  time  pressure  group, 
their  biases  against  male  passengers 
were  mitigated  to  an  extent  leading 
them  to  become  more  conservative.  The 
opposite  might  be  true  for  female 
passengers  wherein  the  index  of 


suspicion  possibly  increased  with  the 
availability  of  more  time  to  scan  the 
image. 

7.0  Conclusions 

The  results  of  this  research  have 
demonstrated  how  social-cognitive 
biases  affect  people  in  the  real  world 
and  how  they  can  subsequently  impact 
the  luggage  screening  process  and 
eventually  national  security.  Through 
the  use  of  computer  simulation  we  have 
shown  that  social-cognitive  biases 
actually  do  have  an  effect  on  the 
detection  of  anomalies  during  luggage 
screening  wherein  decision  makers  use 
these  inherent  biases  as  decision 
heuristics,  particularly  under  conditions 
of  time  pressure.  Clearly,  such  biases 
would  be  difficult  to  detect  through  mere 
observation  of  screening  processes  at 
airports.  Hence,  the  use  of  behavioral 
experimental  and  computer  simulation  is 
invaluable  in  such  sensitive  contexts. 

Most  importantly,  our  results  revealed  a 
clear  relationship  between  decision 
making  process  and  performance. 
Through  the  use  of  both  signal  detection 
variables  and  performance  variables  in 
our  analyses,  we  are  able  to  draw 
conclusions  not  just  about  the  impact  of 
social-cognitive  variables  on 
performance,  but  also  the  processes 
that  led  to  the  observed  behaviors.  This 
is  especially  important  in  the  current 
security  conscious  world  we  live  in  and 
for  training  of  personnel  for  optimal 
decision  making  that  is  free  of  biases 
and  prejudices.  An  associated  goal  of 
this  research  is  to  the  design  community 
for  improving  the  design  of  imaging 
equipment  and  luggage  screening 
stations. 
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Ihe  Luggage  Screening 
Stuqy 

Purpose 

To  examine  if  Social-Cognitive  Biases  would 
influence  active  decision  making 


Hvpothesis 

Time  Pressure  and  Anchor 

High  time  pressure  vs.  low 
Pre-,  No-,  and  Post-anchor 

Passenger  Gender  and  Race 
Male  passengers  will  be  stopped  more 
Minority  races  stopped  more 
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Methods 

Participants 

96  Old  Dominion  University  Students 

Materials 

Gateway  computers  Running  Windows  XP 
SP2 

Simulated  Airline  Luggage  Screening 
Procedure 
Control  Group  (n=24) 

3 Experimental  Groups  (n=72) 


Control  Group  Procedure 


View  Targets 


m 
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E'X  Deri  men  to  I Groups 

Anchor 

“/  think  this  person  is  attractive.” 

"/  will  most  likely  stop  this  person ’s  luggage.  ” 

Pre-anchor  (n=24) 

No-Anchor  (n=24) 

Post-Anchor  (n=24) 


Pre-Anchor  Group  Procedure 


Demographic 

Information 


Instructions 


After  1 00^^  x- 
ray  image  see / 
targets  ag:tj/ 


View  Targets 


Passenger  Picture 


PiigiaTiih^ 


Anchor 


Rate  Confidence 


Zycle  throu^ 


X-ray  Image  of 
Luggage 


Debriefed 


Pass  or  Not  Pass 
Luggage 
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Np-Anchor  Group  Procedure 


Demographic 

Information 


Instructions 


After  X- 
ray  image  see/ 
targets  agaj/ 


View  Targets 


Passenger  Picture 


I Jip:ft15iT]0,PresBiJre:.  2 


Rate  Confidence 


Cycle  tbrou^T 
100  x-ray 


X-ray  Image  of 

Luggage 


Debriefed 


Pass  or  Not  Pass 
Luggage 


Post-Anchor  Group  Procedure 


Demographic 

Information 


Instructions 


After  1 00^^  x- 
ray  image  see / 
targets  ag-t^ 


View  Targets 


Passenger  Picture 


NagliTite 


Rate  Confidence 


Zycle  througj^ 


X-ray  Image  of 
Luggage 


Aft'='r  vir^.vinie  I 


?es  procram  rinisms 


Debriefed 


Anchor 


Pass  or  Not  Pass 
Luggage 
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Variables  of  Interest 

N 

Hit  rate  - the  probability  of  correctly  detecting 
a target. 

False  alarm  rate  - probability  of  incorrectly 
saying  there  is  a target  present,  when  there  is 
no  target 

Sensitivity  (d’j  - the  perceptual  ability  to 
differentiate  between  a target  and  non-target. 

Response  criterion  setting  (c)  - the  propensity 
to  generate  "yes"  or  "no"  responses. 


' Sensitivity  (d’) 

Response  Criterion  Setting 


Criterion  response 


internal  response 


correct  reject 


false  alarm 


Internal  response 
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Results 

V, 

X 

X 

Performance  Analysis 

Hit  Rate 

False  Alarm  Rate 
Signal  Detection  Analysis 
Sensitivity  (d') 

Response  Criterion  Setting  (c) 


HiFRote 

2 (time  pressure:  high  vs.  low)  X 3 (anchor:  pre- 
anchor, post-anchor,  no-anchor)  X 5 (passenger 
race:  White,  Black,  Asian,  Middle  Eastern,  Hispanic) 
X 2 (passenger  gender:  male  vs.  female)  mixed 

measures  ANOVA 

Time  Pressure:  F(i,66)  = 56.i8,p  < .ooi,  p2  = .46 
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Hjt  Rate 


Non-significant  main  effects  and  interactions 


Source 

Main  Effects 

df 

F 

Sig 

n* 

Passenger  Gender 

'1.66 

0.66 

0.42 

0.01 

Passenger  Race 

4.264 

0.713 

0.58 

0.011 

Interactions 

df 

F 

Sig 

n* 

Passenger  Gender  X Time  pressure 

'1.66 

0.17 

0.68 

0.003 

Passenger  Gender  X Anchor 

'2.66 

0.16 

0.86 

0.005 

Passenger  Gerxier  X lime  Pressure  X Anchor 

'2.66 

1.01 

0.37 

0.03 

Passenger  Race  X lime  Pressure 

4.2B4 

0.12 

0.98 

0.002 

Passenger  Race  X Anchor 

8,254 

0.96 

0.47 

0.03 

Passenger  Race  X lime  Pressure  X Anchor 

4.2B4 

0.71 

0.68 

0.02 

Passenger  Gender  X Passenger  Race 

4,264 

1 

0.41 

0.015 

Passenger  Gender  X Passenger  Race  X Twne  Pressure 

4,264 

0.67 

0.62 

0.054 

Passenger  Gender  X Passenger  Race  X Anchor 

8,264 

1.89 

0.062 

0.054 

Passenger  Gender  X Passenger  Race  X fane  Pressure  X Anchor 

8,264 

1.71 

0.095 

0.05 

vtHit  Rate; 

\ 

fim-e  Pressure  Groups 

2 (gender)  X 5 (race)  X 3 (anchor)  mixed  measures 
ANOVA 

High  Time  pressure  Analysis 
All  main  effects  and  interactions  ns 

Low  Time  pressure 

Statistically  Significant  interaction  between 
passenger  gender,  passenger  race,  and 
anchor 

F(8,  132)  = 2.07 l,p  = .043,  p2=  .112 
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Hit  Rate 


Hit  Rate:  Anchor  Groups 


0 9 ^ 


085 


08 


075 


07 


065 


0.6 


055 


05 


Pre^nchor 


No-aichar 
Andior  Group 


Post-^hor 


False  Alarm  Rate 

2 (time  pressure:  high  vs.  low)  X 3 (anchor:  pre- 
anchor, post-anchor,  no-anchor)  X 5 (passenger 
race:  White,  Black,  Asian,  Middle  Eastern,  Hispanic) 
X 2 (passenger  gender:  male  vs.  female)  mixed 

measures  ANOVA 

passenger  gender 

F(l,  66)  = 7.81,p  = .007,  n2  = .H 

tinne  pressure 

F(l,  66)  = 10.80,  p = .002,  p2  = .14 
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False  Alarm  Rate 

ns  main  effects  and  interactions 


Source 


iMain  Effects 

Iff 

F 

Sig 

1 Passenger  Race 

4.254 

0.273 

0.9 

0.004 

1 Anchor 

2.66 

2.01 

0.14 

0.057 

llnteraclions 

df 

F 

Sig 

1 Passenger  Gender  X lime  pressure 

1.66 

1.93 

0.17 

0.03 

1 Passenger  Gender  X Anchor 

2.66 

0.133 

0.88 

0.004 

1 Passenger  Gender  X Hme  Pressure  X Anchor 

2.66 

0.171 

0.84 

0.005 

Passenger  Race  Xllme  Pressure 

4.264 

0.454 

0.77 

0.007 

Passenger  Race  X Anchor 

8.  264 

0.201 

0.99 

0.006 

Passenger  Race  Xllme  Pressure  X Anchor 

4.254 

0.636 

0.75 

0.019 

Passenger  Gender  X Passenger  Race 

4.254 

1.64 

0.16 

0.024 

Passenger  Gender  X Passenger  Race  XTsne  Pressure 

4.  254 

1.71 

0.15 

0.025 

Passenger  Gender  X Passenger  Race  Xllme  Pressure  X Anchor 

8.254 

0.306 

0.96 

0.009 

1^^  Alarm  Rate; 
time  Pressure  Groups 

2 (gender)  X 5 (race)  X 3 (anchor)  mixed  measures 
A NOVA 

High  Time  Pressure 

gender 

F(1,  33)  = 8.395,  p = .007,  r]2  = .20 
gender  by  race  interaction 
F(4,  132)  = 2.430,  p = .051,  p2  = .07 

Low  Time  Pressure 

passenger  gender,  passenger  race,  and  anchor 
F(8,  132)  =2.03,  p = . 05,  ri2  = .ll 
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Falie  Alarm  Rato 


^felse  Alarm  Rate:  High  Time  Pressure 


03 


write  Brack  Asian  MUiae  Eastern  Hispanic 


\False  Alarm  Rate: 

Lo'w^Time  Pressure  interaction 

Pre-anchor  2 (gender)  X 5 (race)  mixed 
measures  ANOVA 

passengergender  

F(l,  23)  = 5.131, p = .033,  n2=  .182 

passengergender  by  passenger  race 
Interaction 

F (4,  92)  = 3. 1 20,  p = .0 1 9,  n2  = . 1 1 9 
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Flit  t Alarm  Rata 


False  Alarm  Rate:  Low  Time  Pressure 
rhjseraction 


\False  Alarm  Rate: 

Lo'w^Time  Pressure  interaction 

No-anchor 

passenger  gender  by  passenger  race 
interaction 

F{4,  92)  = 3.221,  p=  .016,r|2=  .12 

Post-Anchor 

No  significant  main  effects  or  interactions 
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Flitt  Alirm  Ritt 


false  Alarm  Rate: 

Levy;  Time  Pressure  interaction 


\l;alse  Alarrri  Rate: 

A''nc|Tor  GrouDS 

2 (gender)  X 5 (race)  X 2 (time  pressure)  mixed  measures 
ANOVA 

Pre-Anchor 

passenger  gender  by  passenger  race 
F(4,  88)  = 3.132,  p = .019,  n2=. 125 

No-Anchor 

time  pressure 

F(1 , 22)  = 6.958,  p = .01 5,  q2  = .24 
passenger  gender  by  passenger  race 
f(4,  88)  = 3.145,  p = .018,  ri2=. 125 

Post-Anchor 

No  significant  main  effects  or  interactions 
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Falie  Alarm  Rate 


'R^lse  Alarm  Rate:  Anchar  Graups 


Whfte  Bbdc  Asian  hUile  Eastern  Hspanic 


Pssaga^R] 


glse  Alarm  Rate:  Anchor  Groups 


0 3 -I 


025 
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Sensitivity  (d’) 

2 (time  pressure:  high  vs.  low)  X 3 (anchor:  pre-anchor,  post- 
anchor, no-anchor)  X 5 (passenger  race:  White,  Black,  Asian, 
Middle  Eastern,  Hispanic)  X 2 (passenger  gender:  male  vs. 

female)  mixed  measures  ANOVA 

time  pressure 

F(1 , 66)  = 47.34,  p < .001 , q2  = .41 8 

passenger  gender,  passenger  race,  and 
anchor 

F(8,  264)  = 3.34,  p = .001 , q2  = .092 


densit'vity  ( ■') 


Non-significant  main  effects  and 
interactions 


■ Source 

|lMain  EHecIs 

df 

F 

Sig 

1 F^assenger  Gender 

1,  66 

Z99 

0.088 

0.043 

1 Passenger  Race 

4,  264 

0.707 

0.59 

0.011 

1 Anchor 

2,  66 

1.64 

0.2 

0.047 

llnteradions 

df 

F 

Sig 

Passenger  Gender  XTlme  presswe 

1,  66 

0.329 

0.57 

0.005 

1 Passenger  Gender  X Anchor 

2,  66 

0.32 

0.73 

0.01 

iF^assenger  Gender  XTlme  Pressure  X Anchor 

2,  66 

0.242 

0.79 

0.007 

iF^assenger  Race  XTlme  Piessue 

4,  264 

0.283 

0.89 

0.004 

iF^assenger  Race  X Anchor 

8,  264 

0.765 

0.634 

0.023 

Passenger  Race  XTlme  Piessue  X Anchor 

4,  264 

0.601 

0.78 

0.018 

1 Passenger  Gender  X Passenger  Race 

4,  264 

1.22 

0.3 

0.018 

1 l=*assenger  Gender  X Passenger  Race  X Time  Piessuie 

4,  264 

1.65 

0.16 

0.024 

1 F^assenger  Gender  X i=^ssenger  FTace  X Time  Piessuie  X Anchor 

8,  264 

1.4 

0.2 

0.041 
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Sensitivity  (d’): 

Tim, pressure  Groups 

2 (gender)  X 5 (race)  X 3 (anchor)  mixed  measures 
ANOVA 

High  Time  Pressure 

possengergender  bypassenger  race  by 
anchor 

F (8,  1 32)  = 2. 1 44,  p = .036,  n2  = . 1 1 5 

Low  Time  Pressure 

passengergender  by  passenger  race  by 
anchor 

F (8,  1 32)  = 2.607,  p = .0 11 , n2  = . 1 36 


Sensitivity  (d'): 

Anohor  Groups 

2 (gender)  X 5 (race)  X 2 (time  pressure)  mixed  measures 
ANOVA 

Pre-Anchor 

time  pressure 

F(1 , 22)  = 24.068,  p < .001 , q2  = .522 
passenger  gender  by  passenger  race 
F(4,  88)  = 2.963,  p = .024,  n2  = .1 19 
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8eniltlvity  (d') 


xgens’tivity  (d’):  Anchor  Groups 


write  Black  Asian  MHUe  Eastern  Hispaic 


Sensitivity  (ci’): 

Anchor  GrouDS 

No-anchor 

time  pressure 

F(l,22)  = 27.139,p<.001,n2=  .458 
passenger  gender  by  passenger  race 
F(4,  88)  = 2.56,  p = .04,  r\2=  .104 
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8$niltlvity  (d') 


xgensitivity  (d’):  Anchor  Groups 


2.5 


write  Black  Asian  MHUe  Eastern  Hispaic 


Sensitivity  (d'): 

Ajnqhor  Groups 

Post-anchor 

time  pressure 

F(1 , 22)  = 9.71  7,  p = .005,  r]2  = .306 
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8eniltlvity  (d') 


xSens^tivity  (d’):  Anchor  Groups 


ResDonse  Criterion  Setting 

(c)^ 

2 (time  pressure:  high  vs.  low)  X 3 (anchor:  pre-anchor,  post- 
anchor, no-anchor)  X 5 (passenger  race:  White,  Black,  Asian, 
Middle  Eastern,  Hispanic)  X 2 (passenger  gender:  male  vs. 
female)  mixed  measures  ANOVA 

passenger  race 

F(4,  264)  = 8.48,  p < .001,  n2  = .1 14 

passenger  gender,  time  pressure,  and 
anchor 

F(2,  66)  = 3.50,  p = .036,  r]2  = .096 
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Response  Criterion  Setting 

(c)- 

Non-significant  main  effects  and 


\Response  Criterion  Setting  (c): 
Time  Pressure  Groups 

2 (gender)  X 5 (race)  X 3 (anchor)  mixed  measures  ANOVA 

High  Time  Pressure 

passenger  gender 
F(l,33)  = 6.41,  p = .016,  n2=  .163 
passenger  race 
F(4,  1 32)  = 2.46,  p = .048,  r]2  = .069 
anchor 

F(2,  33)  = 3.523,  p = .041,  ri2  = .1 76 

Low  Time  Pressure 
passenger  race 
F{4,  132)  = 6.56,p  < .001,  n2=  .166 
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RMponi«Crlt«rlon  Sitting  (c) 


''  Response  Criterion  Setting  (c): 
A^ic^hor  Groups 

2 (gender)  X 5 (race)  X 2 (time  pressure)  mixed  measures 
ANOVA 

Pre-anchor 

passenger  race 

F(4,  88)  = 2.837,  p = .029,  n2  = . 1 1 4 
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R«tpont«  Criterion  Sitting  (c) 


''  Response  Criterion  Setting  (c): 
Anphor  Groups 

No-anchor 

passenger  gender  by  time  pressure 
F(l,22)  = 14.054,p  = .001,n2=  .390 
passenger  gender  by  passenger  race  by 
time  pressure 

F(4,  88)  = 1 4.054,  p = .001,  q2  = .390). 


\Response  Criterion  Setting  (c): 

I .^^nghor  Groups 


I 


Ml 
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R«tpont«  Criterion  Sitting  (c) 


. Response  Criterion  Setting  (c): 
I AnchorGrouos 


Discussion 

Role  of  tinne  pressure 
Role  of  anchoring 
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Role  of  Time  Pressure 

"n 

X 

High  time  pressure 

Fewer  hits 
Higher  false  alarms 

Less  time,  can  localize  but  not  identify^ 
Higher  response  threshold  for  males 
Low  time  pressure 

Lower  response  threshold  for  male 
passengers 

Lower  response  criterion  setting 


Role  of  Anchoring 

Pre-  or  Post-anchor 

Suppress  social-cognitive  biases 
More  salient  when  time  pressure  was  low 

Decision  heuristics 
Gender 
Race 

Racial  bias:  higher  false  alarm  rates  for 
hispanics 
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IjTiplications  for  Training  and 
Z;esign 

Luggage  screening  stations 
Increase  number  of  screeners 
Block  luggage  screeners  view  of  passengers 
Social-Cognitive  Biases 
Can  be  mitigated 
Create  training  programs 


limitations  and  Conclusions 

s. 

Limitations 

Participants 

Simulation  vs.  real  world  screening 
Tangible  consequences 

Conclusions 

Effects  of  social-cognitive  biases 
Design  and  training 
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Future  Questions  Contact  email: 
jrbrown@odu.edu 
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5.18  EEG  Artifact  Removal  Using  A Wavelet  Neural  Network 


EEG  Artifact  Removal  Using  A Wavelet  Neural  Network 

Hoang-Anh  T Nguyen,  John  Musson,  Jiang  Li  and  Frederick  McKenzie 

Old  Dominion  University 

Hnguv()25'fliodu.cdu  iinuss003aodu.cdu  JLiigjodu.cdu  rdnickcn/j^/iodu.cdu 
Guangfan  Zhang  and  Roger  Xu 
Intelligent  Automation,  Inc. 
azhana@i-a-i.cQm  haxu@i-a-i.com 

Carl  Richey  and  Tom  Schnell 
University  of  Iowa 

richevc@ccad.uiowa.edu  lhomas-schnell@uiowa.edu 

Abstract-  In  this  paper;  we  developed  a wavelet  neural  network  (WNN)  algorithm  for  EEG  artifact  removal  without  EOG  recordings. 
The  algorithm  combines  the  universal  approximation  characteristics  of  neural  network  and  the  time/frequency  property  of  wavelet. 
We  compared  the  WNN  algorithm  with  the  ICA  technique  and  a wavelet  thresholding  method,  which  was  realized  by  using  the 
Stein's  unbiased  risk  estimate  {SURE)  with  an  adaptive  gradient-based  optimal  threshold.  Experimental  results  on  a driving  test  data 
set  show  that  WNN  can  remove  EEG  artifacts  effectively  without  diminishing  useful  EEG  information  even  for  very  noisy  data. 


1.0  INTRODUCTION 

Electroencephalogram  (EEG)  recordings 
are  known  to  be  contaminated  by 
physiological  artifacts  from  various  sources, 
such  as  eye  blinking  or  movements,  heart 
beating  and  movements  of  other  muscle 
groups  [1].  Such  types  of  artifacts  are  mixed 
together  with  the  brain  signals,  making 
interpretation  of  EEG  signals  difficult  [2], 

Eye  movements  or  blinks  usually  produce 
large  electrical  potentials,  which  spread 
across  scalp  and  contaminate  EEG 
recordings.  This  class  of  potential  generates 
significant  electrooculographic  (EOG) 
artifacts  in  the  recorded  EEG.  Removal  of 
EOG  artifacts  is  nontrivial  because  these 
artifacts  spread  across  the  scalp, 
contaminate  and  overlap  in  frequency  with 
the  EEG.  The  effect  of  EOG  artifacts  on 
EEG  activity  is  found  most  significantly  in 
low  frequency  bands:  Delta  (1-4Hz),  Theta 
(4-8  Hz)  and  Alpha  (8-13  Hz)  [3].  Eye 
blinking  generates  spike-like  shapes  with 
their  peaks  can  reach  up  to  800uV  and 
occur  in  a very  short  period,  200-400  ms  [4], 
Meanwhile,  artifacts  generated  by  eye 
movements  are  square-shaped,  smaller  in 
amplitude  but  last  longer  in  time, 
corresponding  to  lower  frequency 
components  [5]. 


In  recent  years,  there  has  been  an 
increasing  interest  in  applying  various 
techniques  to  remove  ocular  artifacts  from 
EEG  [1,  2,  5.  6.8,  10,  13,  14-19],  The 
methods  for  removing  EOG  artifacts  based 
on  regression  in  time  domain  or  frequency 
domain  [8]  were  widely  studied.  All 
regression  methods,  both  in  time  and 
frequency  domains,  rely  on  EOG 
recordings,  which  are  however,  not  always 
available.  Furthermore,  these  methods 
usually  eliminate  the  neural  potentials  which 
are  common  to  reference  electrodes  and 
other  frontal  electrodes. 

Berg  and  Scherg  [10]  proposed  principle 
component  analysis  (PCA)  based  technique 
for  removing  ocular  artifacts.  In  this  method, 
EEG  and  EOG  signals  were  simultaneously 
collected.  It  was  observed  that  PCA  of  the 
variance  in  these  signals  produced  major 
components  corresponding  to  various  eye 
blinks  and  eye  movements.  The  artifacts 
were  removed  by  eliminating  these 
contaminated  components.  Their 
experiments  proved  that  PCA  removes 
artifacts  more  effectively  than  regression 
based  models.  However,  PCA  models 
usually  failed  to  completely  separate 
artifacts  from  cerebral  activity  [11],  and  the 


820 


orthogonal  assumption  for  data  components 
in  PCA  is  hardly  satisfied  [5]. 

Independent  component  analysis  (ICA), 
which  was  developed  for  the  blind  source 
separation  problems,  the  class  of  algorithms 
which  decompose  mixtures  into  original 
sources  without  any  a priori  knowledge 
about  the  mixing  process  or  properties  of 
those  sources,  has  been  used  as  an 
alternative  method  for  EEG  artifact  removal 
[1,  12-14].  ICA  for  artifact  removal  usually 
requires  a large  amount  of  data  and  manual 
visual  inspection  to  eliminate  noisy 
independent  components,  making  the 
method  time-consuming  and  not  suitable  for 
real-time  applications. 

Recently,  the  wavelet-based  methods  [14- 
19]  for  EEG  artifacts  removal  have  received 
significant  attention.  Wavelet  analysis  has 
been  used  as  an  effective  tool  for 
measuring  and  manipulating  non-stationary 
signals  such  as  EEG.  It  provides  flexible 
controls  over  the  resolution  with  which 
neuroelectric  components  and  events  can 
be  localized  in  time,  space,  and  scale.  The 
biggest  advantage  of  using  this  method  for 
EEG  correction  is  that  it  does  not  rely  on 
neither  the  reference  EOG  signal  nor  visual 
inspection. 

This  paper  proposes  a novel,  robust,  and 
efficient  technique  to  remove  EEG  artifacts 
by  combining  the  approximation  capabilities 
of  both  wavelet  and  neural  network 
methods.  The  method  can  be  described 
briefly  as  the  following  (1)  contaminated 
EEG  signals  are  first  decomposed  to  a set 
of  wavelet  coefficients,  (2)  low  frequency 
wavelet  sub-band  coefficients  are  then 
passed  through  and  corrected  by  a trained 
neural  network  and  (3)  the  corrected 
coefficients  are  used  to  reconstruct  clean 
EEG  signals.  The  method  was  applied  to 
correct  EEG  data  contaminated  by  ocular 
artifacts  and  compared  with  other  state-of- 
the-art  methods  including  ICA  and  a wavelet 
thresholding  method. 

The  rest  of  the  paper  is  organized  as 
follows;  Section  2 shows  other  related 
works.  Section  3 presents  the  proposed 


technique.  Section  4 describes  the 
experimental  settings.  Section  5 presents 
some  of  the  achieved  results.  Section  6 
provides  discussions  for  the  results  and 
Section  7 concludes  the  paper. 

2.0  RELATED  WORK 

2.1  EEG  model 

We  assume  the  model  for  contaminated 
EEG  signal  as  in  the  following  form: 

EEGreAO  = EEGtrueW  -h  k.EOG{t) 

where  ffCj-ecCO  is  recorded  contaminated 
EEG,  EEGtj.ue(t)  denotes  the  true  EEG 
signal,  and  k.EOG  (t)  represents  the 
propagated  ocular  artifact  from  eye  to  the 
recording  site.  The  ultimate  purpose  of  any 
artifact  removal  techniques  is  to  recover 
from  EEGygc(t) 

2.2  Wavelet  thresholding 

Wavelet  thresholding  technique  is  built  on 
the  multiresolution  analysis  of  wavelet 
transform,  a tool  that  analyses  signal  in 
different  time  and  frequency  components 
[20].  These  components,  called 
approximations  and  details,  are  further 
processed  by  thresholding  before 
reconstruction  [14]-[18].  By  selecting  a 
‘good’  mother  wavelet,  which  resembles  the 
shapes  of  the  artifacts,  large-valued 
coefficients  are  generated  in  the  areas 
corresponding  to  the  EEG  artifacts  at  low- 
frequency  sub-bands  and  are  considered  as 
an  estimate  of  the  ocular  artifacts.  Thus, 
shrinking  the  amplitude  range  of  these 
coefficients  by  nonlinear  thresholding 
functions  would  remove  those  artifacts.  In 
this  paper,  a wavelet  thresholding  method 
was  implemented  as  follow, 

a.  Use  a buttenvorth  lowpass  filter  to 
smooth  the  EEG  signal  before  further 
processing 

b.  Apply  Wavelet  transform  to  the 
contaminated  EEG  signal 

c.  Utilize  a thresholding  function  to 
automatically  corrected  high-valued 
coefficients  at  low-frequency  sub-bands 

d.  Reconstruct  the  corrected  EEG  signal 
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2.3  Independent  Component 
Analysis 

Independent  component  analysis  was  first 
proposed  by  Herault  and  Jutten  at  a 
meeting  in  Snowbird  Utah  in  1986  [1,  11]  to 
solve  the  blind  source  seperation  problem 
(BSS).  ICA  aims  at  recovering  independent 

source  signals  s = {Si(t),  S2(t), Stj(t)}, 

from  recorded  mixtures  x = {xi(t), 

X2(t) Xrj(t)}  by  an  unknown  matrix  A of 

full  rank.  The  basic  problem  of  ICA  is  to 
estimate  the  mixing  matrix  [A]  or 
equivalently,  the  original  independent 
sources  (s)  based  on  the  following  linear 
relationship  (x  = As)  while  no  knowledge  is 
available  about  the  sources  or  the  mixing 
process.  The  method  was  developed  based 
on  several  assumptions  such  as,  the 
sources  are  statistically  independent,  the 
independent  components  must  have  non- 
Gaussian  distributions  and  the  matrix  [A]  is 
assumed  to  be  square  and  invertible.  ICA 
Identifies  an  unmixing  matrix,  [W],  which 
decomposes  the  multi-channel  scalp  data 
into  a sum  of  temporally  independent  and 
spatially  fixed  components.  ICA  finds  (u  = 
Wx),  where  the  rows  of  the  output  data 
matrix  represent  time  courses  of  activation 
of  the  ICA  components  [1,  9.  11],  Several 
algorithms  have  been  proposed  to 
implement  ICA  such  as  INFOrmation 
MAXimization  approach  (InfoMax),  Fixed- 


point  ICA,  Joint  Approximate 
Diagonalization  of  Eigenmatrices  (JADE) 
algorithm  and  the  Second  Order  Blind 
Identification  (SOBI).  In  this  research,  the 
InfoMax  algorithm  was  used  to  perform  for 
EEG  artifact  removal. 

3.0  PROPOSED  METHOD 

In  this  paper,  we  present  a novel  algorithm. 
Wavelet  Neural  Network  (WNN),  for  EEG 
artifact  revomal.  In  our  method,  the  WNN  is 
trained  with  simulated  data  resembling  the 
properties  in  both  time  and  frequency 
domains  of  EEG  signal.The  trained  WNN  is 
then  used  as  the  corrector  for  contaminated 
data.  In  both  testing  and  training  processes, 
the  original  signal  is  decomposed  first  with  a 
wavelet  to  get  different  frequency 
components.  The  low  frequency  sub-band 
coefficients  are  then  interpolated  to  maintain 
same  lengths.  A trained  artificial  neural 
network  (ANN)  is  fed  with  such  interpolated 
inputs  to  yield  the  corrected  coefficients  at 
its  outputs.  Finally,  the  corrected 
coefficients  are  downsampled  for  the 
wavelet  construction  to  get  the  corrected 
signal  y of  original  contaminated  x as  shown 
in  Figure  1. 

The  core  idea  of  the  method,  decomposing 
the  signal  in  both  time  and  frequency 
domains  with  wavelet  and  using  an  ANN  to 


Figure  1.  Proposed  Wavelet  Neural  Network  Structure. 
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correct  them,  can  be  viewed  in  a more 
succinct  (and  perhaps  more  precise)  way. 
By  combining  the  time/frequency  property  of 
wavelet  and  the  universal  approximation 
capability  of  neural  network,  we  would  be 
able  to  keep  useful  information  related  to 
cognitive  activities  while  eliminate  artifacts 
in  EEG. 

3.1  EEG  data  simulation 

As  described  in  [23],  EEG  signal  can  be 
simulated  based  on  three  assumptions,  (1) 
Short  segments  of  the  spontaneous  EEG 
can  be  described  as  linearly  filtered,  (2) 
non- stationary  components  in  the 
spontaneous  EEG  can  be  simulated  by 
changing  the  characteristics  of  this  filtering 
process  and  (3)  the  spectral  property  of  the 
simulated  EEG  data  resembles  that  of 
actual  signal.  As  shown  in  Figure  2,  a set  of 
Gaussian  noises  (GN)  were  generated  and 
then  filtered  by  a number  of  lowpass  and 
bandpass  filters  with  different  cut-off 
frequencies  that  are  similar  to  the  spectral 
property  of  EEG  frequency  bands. 
Transients  like  eye  blinks  and  eye 
movements,  colleted  from  real  signals  were 
then  filtered  by  lowpass  filter  and  added  to 
make  the  simulated  data  contaminated. 
Cutoff  frequencies  for  those  filters  are 


summarized  in  Table  1 . 


Table  1.  EEG  Frequency  Band 
Specifications 


Freq.  bands 

Lower  (Hz) 

Upper  (Hz) 

Delta 

0.5 

4 

Theta 

4 

8 

Alpha 

8 

13 

Beta-1 

13 

20 

Beta-2 

20 

30 

Gamma 

30 

50 

3.2  Neural  Network  Training 

The  backpropagation  (BP)  is  used  as  the 
machine  learning  technique  for  multi-layer 
perceptron  (MLP)  neural  nework. 
Experimental  results  show  that  the  one 
hidden  layer  neural  network  structure  3-5-3 
(3  inputs,  5 hidden  units  and  3 outputs)  is 
good  enough  for  EEG  occular  aritfact 
removal  issue.  The  trained  ANN’s  input  and 
output  are  low  frequency  sub-band 
coefficients  of  the  wavelet-decomposed 
simulated  data  and  these  coefficients  after 
corrected,  respectively.  In  this  paper,  the 
number  of  iterations  for  ANN  training  is  set 
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at  200,  but  this  number  might  be  optimized 
to  improve  the  training  accuracy. 

3.3  Performance  metric 

We  use  two  metrics,  power  spectrum 
density  (PSD)  and  frequency  correlation,  to 
assess  the  proposed  method.  The  PSD  is  a 
popular  metric  used  to  show  information 
about  the  power  spectrum  of  EEG  signal  at 
specific  frequencies.  Calculation  of  the 
correlation  in  frequency  domain  before  and 
after  artifact  removal  is  equivalent  to  the 
correlation  in  time  domain  after  filtering  the 
time  series  with  the  corresponding 
frequency  filter  [14]-[22],  The  frequency 
correlation  between  jc  and  y is  computed  as 
in  the  following  formula, 

2*  2Si(**y+  y’*) 

Vfl5«-*ZKyy- 

where  w1  and  w2  are  the  lower  and  upper 
limits  of  the  interested  power  spectrum 
region  to  be  calculated,  c is  the  correlation 
value  that  will  be  assigned  to  the  frequency 
of  (w1+w2)/2.  If  frequencies  x and  y are 
identical,  c gets  1,  otherwise,  c obtains  a 
value  between  0 and  1.  In  this  paper,  the 
‘window  size’,  w1-w2,  is  selected  equal  to  2. 

4.0  EXPERIMENTS 

4.1  Datasets 

We  validate  our  method  on  a data  set, 
which  was  collected  when  participants  were 
performing  a driving  test.  The  EEG 
information  was  collected  by  a 128-channel 
recording  system  at  the  sampling  rate  of 
1000  Hz  along  with  other  information 
including  description  of  the  task,  system 
dynamics  related  information,  performance 
measures,  physiological  signals  (ECG, 
respiration,  etc.),  and  eye  tracking.  The 
workload  was  also  analyzed  according  to 
the  driving  conditions  (city-driving,  stopped, 
highway  passing,  etc.).  Due  to  the  recording 
condition,  the  subject  eye  movements  and 
blinks  happen  at  high  frequency  making  the 
data,  especially  at  frontal  recording 


channels,  highly  contaminated  by  ocular 
artifacts. 

4.2  Experimental  settings 

We  implemented  three  artifact  removal 
methods  for  comparison,  the  1C  A method, 
the  wavelet  thresholding  algorithm  and  the 
proposed  WNN  technique.  For  each 
algorithm,  we  computed  PSD  and  frequency 
correlation  before  and  after  artifact  removal 
to  illustrate  the  effectiveness  of  each  of  the 
algorithms.  For  the  proposed  method,  we 
first  simulated  EEG  signals  to  train  an  ANN 
and  tested  the  trained  model  on  a simulated 
signal  and  the  driving  test  data  set.  For  the 
wavelet  thresholding  method,  we 
implemented  it  by  following  the  instruction  in 
[20]  and  for  the  ICA,  we  utilized  the 
EEG  LAB  software. 

5.0  RESULTS 


(b) 

Figure  3.  Clean  and  Contaminated 
Simulated  Signal  for  (a)  Training  and  (b) 
Testing. 
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For  the  proposed  WNN  algorithm,  two 
simulated  segments  with  a length  of  5 
seconds  for  training  and  testing  at  sampiing 
rate  of  1000  Hz  were  created  as  shown  in 
Figure  3a  and  3b,  respectively,  where  the 
artifacts  were  taken  from  the  driving  test 
data  set  and  added  to  the  simulated  data 
segments.  Data  in  Figure  3a  was  then  used 
to  train  the  neurai  network  in  the  proposed 
WNN  aigorithm.  We  applied  the  trained 
WNN  model  to  the  testing  data  segment 
(Figure  3b),  and  the  corrected  EEG  signai  is 
shown  in  Figure  4.  Figures  5 shows  PSD  of 
the  contaminated,  corrected  and  the  clean 
EEG  signals.  Figure  6 shows  frequency 
correlations  among  those  signals. 


Figure  4.  Contaminated  Simulated  and 
WNN  Corrected  Singals 


Figure  5.  PSD  of  Clean,  Contaminated 
and  WNN  Corrected  Signals  for  Testing 

For  the  ICA  algorihtm,  it  took  a computer, 
equiped  with  Intel(R)  Core(TM)  2 CPU  6400 
@2.13  GHz  and  RAM  2.00  GB,  27  minutes 


with  382  steps  to  remove  the  artifacts  for 
one  EEG  segment  in  the  driving  data  set. 


(a) 


(b) 

Figure  6.  Frequency  Correlation  between 
(A)  Contaminated  and  Corrected 
Simulated  Signals  and  (B)  Clean  and 
Corrected  Simulated  Signals. 

We  then  applied  the  trained  WNN  model  to 
the  driving  test  data  set.  We  decomposed 
the  EEG  signal  to  8 levels  and  3 low 
frequency  sub-band  coefficients  were 
corrected  by  the  WNN  algorithm. 

The  wavelet  thresholding  method  was  used 
to  adaptively  correct  4 low  frequency  sub- 
bands coefficients.  For  specific  data 
segments,  the  corrections  were  repeated  a 
number  of  times  with  various  wavelets  and 
at  different  levels  of  decompositions  in 
order  to  make  the  corrected  data  most 
acceptable.  The  wavelets  from  Coiflet  and 
Daubechies  family  were  chosen  because 
experiments  show  that  they  could  extract 
the  features  of  artifacts  efficiently. 

Figure  7 show  PSD  plots  for  one  sample 
artifact  removed  segment  in  the  driving  test 
data  by  the  three  algorithms.  Figure  8 
shows  the  segment  in  time  domain.  Figure  9 
shows  frequency  correlations  between  the 
contaminated  and  corrected  segments. 
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Figure  7.  PSD  of  Contaminated  and  De- 
contaminated EEG 


(a) 


Figure  8.  Contaminated  and 
Decontaminated  EEG  (a)  Contaminated, 
ICA  and  WNN  Corrected  EEG  (b) 
Contaminated,  Wavelet  Thresholding 
and  WNN  Corrected  EEG 
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(b)  Wavelet  Thresholding 
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(c)  WNN 

Figure  9.  Frequency  Correlation  between 
Contaminated  and  Decontaminated  EEG, 
(a)  by  ICA,  (b)  by  Wavelet  Thresholding 
and  (c)  by  WNN 
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6.0  DISCUSSION 

It  is  observed  from  various  results  that  the 
WNN  algorithm  removes  ocular  artifacts 
efficiently  while  keeping  cerebral 
background  information.  Like  Wavelet 
Thresholding,  WNN  just  needs  one  single 
channel  data  to  perform  correction  that 
makes  it  advantageous  over  ICA,  which 
needs  to  perform  on  the  whole  dataset. 
Furthermore,  the  method  was  proved 
through  repeated  experiments  on  various 
data  segments  for  its  effectiveness  and 
stability,  which  is  not  true  for  the  wavelet 
thresholding  algorithm. 

The  PSD  plot  shows  that  the  low  frequency 
components  are  reduced  significantly  in  the 
corrected  signal.  That  is  more  evident  if  we 
look  at  the  frequency  correlation  metric  plot 
between  contaminated  and  corrected 
signals:  there  is  a slight  difference  in  the 
range  of  low  frequency  components  while  in 
other  ranges,  the  useful  information  is  well- 
preserved. 

The  frequency  correlation  plots  also  show 
that  the  correction  made  by  ICA  spreads 
over  the  entire  frequency  range  and  the 
power  of  low  frequency  components  are 
reduced  not  significantly.  Mean  while,  the 
low  frequency  components  in  the  signal 
were  derogated  by  Wavelet  Thresholding 
and  WNN  while  high  frequency  components 
are  well  preserved  by  both. 

ICA  requires  a lot  more  computing  power 
and  multiple  channel  data  sources  for 
artifact  removal.  It  also  demands  either  an 
automatic  or  a manual  step  to  determine 
which  independent  component  is  artifact, 
making  an  online  implementation  of  ICA 
difficult. 

7.0  CONCLUSIONS 

We  proposed  a novel  algorithm,  WNN,  for 
artifact  removal  for  EEG  signal.  The 
algorithm  combines  the  time/frequency 
property  of  wavelet  and  the  approximating 
capability  of  neural  network  to  locate  and 
eliminate  artifacts.  Experimental  results  ona 
driving  data  set  show  that  WNN  can 
effectively  remove  artifact  and  achieve 


better  results  than  the  wavelet  thresholding 
algorithm.  WNN  is  also  much 
computationally  efficient  than  the  ICA 
algorithm  making  it  possible  an  automatic 
online  algorithm. 
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EEG  and  artifacts 


• Electroencephalogram  (EEG); 

- Neural  electrical  signal 

- Recorded  by  using  recording  system 

- Important  to  many  application  fields: 
Computer  control  and  communication, 
entertainment,  education,  military, 
commercial,  etc. 

• Artifacts: 

- Unavoidable  non-cortical  activities 


- Sources:  Muscle  activity,  line  noise,  heart  - 
beating,  eye  movements  and  blinks,  etc.  = 

• Electrooculogram  (EOG)  artifact:  = 

- Generated  by  eye  movements  or  blinks 

- Main  artifactual  portion  of  EEG  recordings 
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Wavelet  thresholding 


Contaminated  EEG 


Low-pass  filtering 


Wavelet  transform 


Thresholding 


Wavelet  reconstruction 


Corrected  EEG 


Threshold  function[18]: 


w + t , w<t 

2k*l 
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W'  + t + , w > t 

2k+l 


Low-pass  filter: 

- Butterworth 

Wavelet  basis  function  selection: 

- Daubchies  and  Coiflet 

- Sensitive  to  time/frequency 
properties  of  EEG  waves 


Independent  Component 
Analysis 


• Notations: 

- X original  mixtures 

- s blind  source  matrix 

- u estimated  source 

- A mixing  matrix 

- W un-mixing  matrix,  inverse  of  A 

• ICA  assumptions: 

- Source  independence 

- Non-Gaussianality 

- A and  W to  be  square  and  invertible 

• Source  independence  definition: 

- Minimizing  mutual  information 

- Maximizing  non-Gaussianality 


S X u 


Unknown  Blind 

mixing  Source 

process  Separation 
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ICA  in  EEG  artifact  removal 
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Being  a batch  algorithm,  ICA  needs  to  be  performed  on  the  whole  data 
set  with  at  least  an  adequate  number  of  data  points,  so  the  computational 
power  is  expensive. 
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Wavelet  Neural  Network 


Figure  2.  Wavelet  Neural  Network  structure 
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Network  Training 
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EEG  data  simulation 
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Simulated  EEG 
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Network  testing 
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Experiments 


Data  set: 


- Driving  test 

- 128  recording  channel  system 

- Highly  disturbed  by  multi-type  artifacts 

Experimental  settings: 

- Methods:  WNN,WT and  ICA 

- Validation  metrics:  PSD  and  frequency 

correlation 


• Frequency  conelationmathematical  fomiula: 
i.Z:::*(x-y+  y-J) 

C = ^ 


MOOSIM  WORLD 
Conference  & Expo 


Results  - Simulated  EEG 


Figures.  WNN  performance  on  simulated  data 
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Results  - Simulated  EEG 


Clean  testing  signal 

Corrected  by  WNN 

Corrected  by  Wavelet  Thresholding 


Figure?.  Clean  simulated  and  Decontaminated  signals  by 
WNN  and  Wavelet  Thresholding 
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Error  signals 


Technique 

RMSE 

T 1 1 1 

WNN 

Wavelet  Ihiesholding 

WNN 

12.2473 

Wavelet 
' Thresholding 

16.4942 

Figure  8.  Error  signals,  or  differences  between  the  ‘ground 
truth’  and  signals  corrected  by  WNN  and  WT 
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“ PSD  and  frequency 
correlation 


Figure  9.  PSD  (left)  and  frequency  correlation  between  contaminated  and 
corrected  simulated  signals  (center)  and  clean  and  corrected  simulated 
signals  (right) 
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Results  - Real  EEG 


— Contarrinated  EEG 

— Corrected  by  Wavelet  Thresholding 
— Corrected  by  WNN 


Figure  10.  Contaminated,  Wavelet  Thresholding  and 
WNN  Corrected  EEG 
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Results  - Real  EEG 


I 


— Contaminated  EEG 

— Corrected  by  ICA 

— Corrected  by  WNN 


Figure  11.  Contaminated,  iCA  and  WNN  Corrected  EEG 
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Power  Spectrum  Density 


Figure  12.  PSD  of  Contaminated  and  De-contaminated  EEG 
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Frequency  correlation 


a)  b)  c) 

Figure  13.  Frequency  Correlation  between  Contaminated  and 
Decontaminated  EEC,  (a)  by  ICA,  (b)  by  Wavelet  Thresholding  and  (c) 
by  WNN 
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Discussions 


Techniques  comparison 


Techniques  Regression  PCA  ICA  WT  WNN 
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Wavelet  Thresholding:  It  is  sensitive  to  Wavelet  basis  function  choice 
ICA:  computational  complexity 
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Conciusions 


A novel  and  efficient  method  Wavelet  Neural 
Network  and  its  application  to  EEG  artifact 
removal 

Make  comparisons  with  several  methods 

• ICA 

• Wavelet  Thresholding 

Future  work 
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6.1  Virginia  Demonstration  Project  Encouraging  Middle  School  Students  in 
Pursuing  STEM  Careers 

Virginia  Demonstration  Project  Encouraging  Middle  School 
Students  in  Pursuing  STEM  Careers 

Jane  T.  Bachman 

Naval  Surface  Warfare  Center  Dahlgren  Division 
Dr.  Dena  H.  Kota 

Naval  Surface  Warfare  Center  Dahlgren  Division 
Aaron  J.  Kota 

Naval  Surface  Warfare  Center  Dahlgren  Division 

Abstract.  Encouraging  students  at  all  grade  ieveis  to  consider  pursuing  a career  in  Science,  Technoiogy,  Engineering,  and 
Mathematics  (STEM)  fields  is  a national  focus.  In  2005,  the  Naval  Surface  Warfare  Center,  Dahlgren  Division  (NSWCDD),  a 
Department  of  Defense  laboratory  located  in  Dahlgren,  Virginia,  began  vi^ork  on  the  Virginia  Demonstration  Project  (VDP)  with  the 
goal  of  increasing  more  student  interest  in  STEM  education  and  pursuing  STEM  careers.  This  goal  continues  as  the  program  enters 
its  sixth  year.  This  project  has  been  successful  through  the  participation  of  NSWCDD's  scientists  and  engineers  who  are  trained  as 
mentors  to  work  in  local  middle  school  classrooms  throughout  the  school  year.  As  an  extension  of  the  in-class  activities,  several 
STEM  summer  academies  have  been  conducted  at  NSWCDD  These  academies  are  supported  by  the  Navy  through  the  VDP  and 
the  STEM  Learning  Module  Project.  These  projects  are  part  of  more  extensive  outreach  efforts  offered  by  the  National  Defense 
Education  Program  (NDEP),  sponsored  by  the  Director,  Defense  Research  and  Engineering.  The  focus  of  this  paper  is  on  the  types 
of  activities  conducted  at  the  summer  academy,  an  overview  of  the  academy  planning  process,  and  recommendations  to  help 
support  a national  plan  of  integrating  modeling  and  simulation-based  engineering  and  science  into  all  grade  levels,  based  upon  the 
lessons  learned. 


Distribution  Statement  A:  Approved  for  Public  Release;  Distribution  is  unlimited, 
Bachman,  Kota,  Kota 


1.0  INTRODUCTION 

Since  2005,  the  Naval  Surface  Warfare 
Center,  Dahlgren  Division  (NSWCDD),  a 
Department  of  Defense  laboratory,  has 
been  working  on  National  Defense 
Education  Program’s  (NDEP)  Virginia 
Demonstration  Project  (VDP)  with  the  goal 
of  increasing  student  interest  in  Science, 
Technology,  Engineering,  and  Mathematics 
(STEM)  education.  One  of  the  VDP  STEM 
focus  events  is  conducting  a summer 
academy  for  middle  school-age  students. 
During  this  week-long  event,  students 
participate  in  a variety  of  STEM  activities. 
Provided  in  this  paper  is  an  overview  of  the 
academy  planning  process  and  a focus  on 
the  types  of  activities  selected  and 
conducted  during  the  academy. 

1.1  Academy  Planning 

The  planning  for  a summer  academy  event 
is  a year-long  process.  Planning  consists  of 
coordination  of  the  event  dates,  facility, 
mentors,  schools,  activity  selections,  mentor 
and  Junior  mentor  training,  scheduling, 
supply  and/or  inventory,  and  execution. 


The  first  objective  is  to  organize  a STEM 
Academy  Planning  Team,  consisting  of 
scientists  and  engineers  (S&Es),  academia, 
and  a middle  school  teacher. 

1.2  VDP  STEM  Academy  Planning 
Team 

The  team  meets  once  a month,  addressing 
planning  activities  identified  in  an  academy 
timeline  developed  at  Dahlgren,  The  main 
goals  of  the  team  are  to  select  the  academy 
dates,  review  and  select  the  STEM 
activities,  assign  leads  for  each  activity 
(most  activities  are  conducted  by  the 
mentors,  but  during  mentor  training  and 
academy  week,  the  authors  found  it 
beneficial  to  have  an  assigned  activity  lead), 
selection  of  a facility  to  host  the  event,  and 
train  mentors  and  Junior  mentors.  The 
planning  team  becomes  the  staff  executing 
the  academy  event. 

1.2.1  Director  and  Coordinator 

The  academy  staff  has  a director  and  a 
coordinator.  The  director  manages  the 
collaboration  between  mentors,  schools, 
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facility,  and  training  as  well  as  leads  the 
monthly  planning  team  meetings.  The 
coordinator  handles  the  coordination 
between  publications,  supplies,  equipment, 
tools,  inventory,  and  academy  scheduling. 

1.2.2  Academy  Training 

The  VDP  STEM  Academy  Planning  Team 
conducts  a three-day  training  event  for  all 
mentors  participating  in  the  academy.  In 
addition,  the  junior  mentors  receive  training 
during  one  of  the  three  days.  All  mentors 
receive  ethical  training  as  well  as  training  on 
inquiry-based  learning  techniques.  The 
academy  schedule,  rules,  procedures,  and 
activities  are  discussed  with  the  mentors. 
Exploratory  laboratory  time  is  worked  into 
the  training  schedule  to  give  mentors 
additional  time  to  spend  on  any  one  of  the 
activities  discussed.  Mentors  are  exposed 
to  all  student  activities,  including  team 
building. 


2.0  ACADEMY  ACTIVITIES 

Each  STEM  activity  is  reviewed  and 
selected  by  the  planning  team  with  the  goal 
of  providing  a wide  range  of  STEM  activities 
that  cover  multiple  careers  for  the  students. 
Once  an  activity  is  selected  by  the  team,  an 
activity  plan  is  derived  and  finalized  for  the 
mentor  manual.  Upon  complete  selection  of 
the  academy  activities,  a schedule  is 
formulated  for  the  week.  Students  this  year 
were  given  the  opportunity  to  explore  STEM 
careers  in  life  science,  robotics,  tower 
design/construction/design  presentation, 
and  data  collection  and  analysis  through 
water  rocket  activities.  Students  also 
participate  in  laboratory  demonstrations, 
listen  to  guest  speakers  discuss  their  STEM 
careers,  and  learn  about  team 
communication  and  collaboration  skills. 
There  is  very  little  downtime  during  the  day 
for  the  students.  Students  are  equipped  with 
a summer  academy  student  manual 
containing  their  activities  and  the 
information  they  need  during  the  week. 


Older  students  participate  in  a Junior 
Mentor  program  during  the  academy.  Half  of 
their  day  is  spent  carrying  out  academy 
administrative  duties,  while  the  other  half  is 
spent  working  on  an  assigned  robotics 
project.  Test  engineers  are  one  of  the  roles 
that  a junior  mentor  serves  when  conducting 
administrative  duties.  This  year,  the  project 
was  to  build  a robot  that  could  navigate  a 
maze. 

2.1  Activity  Descriptions 

• Team  Building.  Mentors  are  provided 
several  team-building  activities  that  they 
can  help  facilitate.  The  team  derives  a 
name  and  constructs  a poster  that  will 
host  their  mission  completion  tags. 

• Life  Science.  Two  life-science  activities 


Figure  1,  Life  Science  Activity 


are  conducted  to  simulate  the  types  of 
ongoing  research  at  naval  laboratories. 
Students  learn  about  the  spread  of  an 
epidemic  and  possible  methods  used  by 
scientists  to  combat  such  types  of 
warfare  (see  Fig.  1). 

• Tower  Building.  This  consists  of  several 
phases  of  work  for  the  students.  First, 
the  student  team  decides  on  a design 
for  their  tower.  Second,  they  construct 
the  tower  (see  Fig.  2),  followed  by 
testing  the  strength-weight  ratio.  To 
conclude,  the  team  formulates  its  design 
into  a presentation  that  is  given  in  front 
of  an  invited  panel.  At  the  conclusion  of 
the  team’s  presentation,  the  panel 
conducts  a question-and-answer  period. 
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Figure  2.  Tower  Construction  Activity 

• Water  Rockets.  Students  construct  a 
water  rocket  and  conduct  several  trial 
tests  to  gather  data.  Following  data 
analysis,  the  students  decide  on  final 
measurements  and  conduct  a final 
water  rocket  test  to  achieve  the  highest 
launch  possible  (see  Fig.  3). 


Figure  3.  Water  Rocket  Activity 

2.1.1  Robotics 

The  robotic  activity  contains  eight  missions. 
Mission  rules  are  established  to  provide 
some  boundaries;  however,  team  creativity 
is  encouraged.  Robotics  boards  contain  a 
challenge  mat  denoting  the  home  base 
location,  island,  troop  rotation,  humanitarian 
aid  drop  area,  ship  rescue  area,  and  dry 
dock.  The  ‘Map  the  Underwater  Surface’ 
has  its  own  table  designed  to  represent 
underwater  terrain.  Teams  can  test  as 
many  times  as  they  want  (see  Fig.  4)  prior 
to  a test  engineer  witnessing  the  final 
mission  test. 


Figure  4.  Robotic  Missions 


• Rescue  the  Swimmer.  Team  members 
will  need  to  rescue  a swimmer.  Starting 
from  home  base,  the  robot  must  be 
capable  of  maneuvering  around  the 
island,  grabbing  the  swimmer  from  a 
known  location,  and  bringing  the 
swimmer  to  shore  by  any  means.  The 
robot  may  use  any  sensor  available  to 
the  team  including  the  rotation  sensors 
built  into  the  motors. 

• Troop  Rotation.  Team  members  must 
transport  troops  from  home  base  to  a 
specific  troop  location  across  the  water 
that  already  contains  a group  of  troops 
on  a platform.  Troops  must  not  touch 
the  water  during  transport.  The  troops 
must  be  wholly  within  the  drop-off  area, 
and  the  robot  may  not  run  over  them  on 
the  return  trip.  The  robot  must  return 
the  original  troops  stationed  at  this 
location  to  home  base.  There  must 
always  be  a minimum  of  five  troops  at 
the  designated  location.,  and  troops  are 
not  to  be  mixed  during  the  transfer.  The 
robot  must  use  at  least  two  sensors. 

• Recover  the  Ship.  Team  members  will 
recover  the  damaged  ship  and  bring  it 
back  to  home  base  by  any  means.  If  the 
robot  turns  the  ship  on  its  side  or  flips  it, 
the  mission  must  be  reattempted.  The 
robot  must  use  at  least  two  sensors. 

• Create  an  Early  Warning  Structure. 
Team  members  will  need  to  design  a 
stationary  early  warning  structure 
containing  an  NXT  brick  that  will  act  as 
a signaling  device.  The  structure  will  be 
placed  where  the  lighthouse  is  located 
on  the  challenge  board.  A test  engineer 
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will  then  start  a robot  from  home  base 
and  direct  its  movement  toward  the 
tower.  As  a robot  approaches  the 
structure,  a series  of  signals  must 
indicate  the  distance  from  the  robot  to 
the  tower.  These  distances  should  be 
broken  into  three  range  groupings  each 
of  which  corresponds  to  a unique  signal: 
more  then  10  inches  away,  10-4  inches 
away,  and  less  than  4 inches.  Teams 
may  select  whatever  signaling  method 
they  would  like.  Examples  include  three 
colored  lights,  changing  the  NXT 
display,  or  variations  in  tone.  Note; 

Each  team  should  have,  for  the  test 
engineer,  a programmed  robot  able  to 
approach  the  early  warning  structure  at 
a slow-to-moderate  pace  during  testing. 

• Recover  Beacon.  Team  members  will 
need  to  detect  and  recover  an  infrared 
(IR)  beacon  located  somewhere  in  the 
water.  The  test  engineer  may  place  the 
beacon  anywhere  within  the  beacon 
placement  area  (see  map  of  robotics 
board).  Each  team  searches  the  area 
and  captures  the  beacon.  The  robot 
must  use  at  least  an  IR  seeker. 

• Map  the  Underwater  Surface.  Team 
members  will  need  to  create  a map  of 
underwater  terrain  using  the  ultrasonic 
sensor  and  the  data  collection  features 
of  the  robotic  software.  The  map  must 
provide  depth  and  inches  traveled  in 
inches.  It  should  be  scaled  properly  and 
reported  to  a test  engineer  on  graph 
paper. 

• Dry  Dock.  The  teams’  robots  will  need 
leave  from  home  base,  drive  up  onto  the 
top  level  of  the  dry  dock,  display  a 
message,  remain  there  for  five  seconds, 
drive  off  of  the  dry  dock,  and  return  to 
home  base.  The  robots  may  use  any 
sensor  available  to  the  teams  including 
rotation  sensors  built  into  the  motors. 

• Humanitarian  Aid.  Each  team  must 
deliver  five  crates  of  humanitarian  aid  to 
the  designated  location.  The  robot  may 
use  any  sensor  available  to  the  team 
including  the  rotation  sensors. 


2.1.2  Academy  Token  Plan 

The  VDP  STEM  Academy  Planning  Team 
generated  a plan  where  tokens  serve  as  the 
students’  form  of  currency  for  the  week. 
Each  time  a team  attempts  to  complete  a 
robotic  mission,  a token  must  be  paid  to  the 
test  engineer.  The  tokens  provide  students 
with  an  incentive  program  during  their 
summer  academy  week.  The  role  of  'token 
master’  is  served  by  a junior  mentor  who, 
under  the  supervision  of  the  academy 
director,  is  in  charge  of  distributing  tokens. 
Teams  begin  the  week  with  10  tokens  and 
the  plan  details  events  throughout  the  week 
in  which  teams  can  earn  tokens  based  on 
their  accomplishments  and  teamwork. 


3.0  LESSONS  LEARNED 

Following  the  academy,  the  VDP  STEM 

Academy  Planning  Team’s  first  meeting  is 

to  identify  the  lessons  learned  and  to 

discuss  the  “do’s  and  don’ts”  for  next  year. 

Below  is  a list  of  this  year’s  lessons  learned. 

• Facility.  A gym  that  provides  enough 
space  for  the  robotic  boards  and  sixteen 
team  tables  is  ideal  venue  for  the  event. 

• Teams.  Seven-member  teams  are  less 
favorable  than  five-member  teams. 

• Career  speaker.  Guest  speakers 
opening  morning  sessions  with  a 10-15 
minute  brief  on  their  careers  is 
beneficial. 

• Early  arrivals.  Students  can  watch 
LabTV  while  waiting  for  other  buses  to 
arrive  to  the  academy  site. 

• Robotics.  Add  robotics  refresher 
training  time  for  junior  mentors. 

• News  flash.  A daily  academy  news 
board  that  junior  mentors  can  coordinate 
and  monitor  should  be  considered. 

• Rocket  launcher.  Teams  discovered 
that  this  year’s  launcher  operated  better 
than  last  year’s  (see  Fig.  5). 
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Figure  5.  Water  Rocket  Launcher 


• Internet  access.  Student  teams  use  the 
Internet  for  research  Into  tower  building 
and  for  generating  their  briefs. 

• Junior  mentor  project.  The  academy 
team  agreed  that  the  maze  navigation 
was  a good  project  for  this  group. 

• Laboratory  demonstrations.  The 
academy  team  used  this  activity,  which 
the  students  enjoyed,  as  one  of  the  focal 
points  (see  Fig.  6)  on  the  last  day  of  the 
academy. 


Figure  6.  Student  Laboratory 
Demonstration  Participation 


4.0  CONCLUSION 

The  NDEP  VDP  STEM  Summer  Academy 
program  provides  middle  school  students 
with  “hands-on”  activities  that  contain 
challenges  in  both  robotics  and  engineering 
problems.  Teaming  middle  school  teachers 
with  practicing  S&Es  at  NSWCDD  is  one 


way  the  VDP  STEM  Academy  generates 
student  interest  in  math  and  science.  With 
respect  to  integrating  modeling  and 
simulation  (M&S)-based  engineering  into 
the  program,  the  planning  team  used 
simulation  in  two  life  science  activities: 
directing  students  build  a tower  model  and 
organizing  the  robotic  challenges  so  that  the 
students  built  and  programmed  a robot  to 
simulate  a Navy  initiative  using  sensors  and 
motors.  The  academy  team  recommends 
the  following  considerations: 

1)  Involve  a math  and/or  science 
teacher  of  the  targeted  grade  early 
in  the  activity  decision-making 
process.  The  teacher  can  assess 
the  activity  with  the  skill  set  the 
students  have  and  determine 
whether  the  M&S  activity  is 
acceptable. 

2)  Run  through  the  exercise  prior  to  the 
planning  team’s  activity  review. 

3)  Generate  an  M&S  activity  plan  to 
include  purpose,  design,  analysis, 
and  test. 

4)  Prepare  training  material  for  the 
mentors  that  includes  all  pertinent 
information  regarding  the  activity. 

5)  Evaluate  the  results  from  first  use  of 
the  M&S  activity,  and  determine  if 
adjustments  are  needed  or  if  the 
activity  should  be  removed  from  the 
curriculum.  The  VDP  STEM 
Academy  planning  team  decided  to 
change  the  junior  mentor  robotics 
project  to  maze  navigation  this  year, 
which  the  junior  mentor  team 
successfully  concluded. 

6)  Create  team  roles  that  allow 
students  to  hold  a responsible  lead 
for  the  team.  Roles  may  Include 
data  manager,  team  supply 
manager,  robotic  maintenance 
manager,  and  water  rocket  data 
recorder. 

7)  Collect  feedback  from  mentors  at  the 
end  of  the  event  and  use  for  future 
improvements. 

In  conclusion,  providing  a working 
environment  experience  where  students  can 


851 


sense  the  why,  what,  and  how  things  are 
done  through  interaction  with  S&Es  and 
math  and  science  teachers  can  benefit  them 
when  they  begin  making  career  decisions. 
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6.2  Using  Game  Development  to  Engage  Students  in  Science  and 
Technology 

Using  Game  Development  to  Engage  Students  in 
Science  and  Technology 

John  Wiacek 

ECPI  College  of  Technology 
iwiacek<i3)  ecpi.edu 

Abstract.  Game  design  workshops,  camps  and  activities  engage  K-12  students  in  STEM  disciplines  that  use  game  engine  and 
development  tools.  Game  development  will  have  students  create  games  and  simulations  that  will  inspire  them  to  love  technology 
while  learning  math,  physics,  and  logic.  By  using  tools  such  as  Game  maker,  Alice,  Unity,  Gamesatad  and  others,  students  will  get  a 
sense  of  confidence  and  accomplishment  creating  games  and  simulations. 


1.0  NOMENCLATURE 

STEM  ; Science,  Technology,  Engineering, 
and  Math 

NPC  : Non-Player  Character,  computer 
controlled  character 
GUI : Graphical  User  Interface 
STEP  : Science  and  Technology 
Enrichment  Program. 

2.0  WHAT’S  THE  PROBLEM? 

In  this  day  and  age  of  science  and 
technology,  students  are  struggling  more 
than  ever  with  math,  science,  and 
technology.  There  are  several  countries 
across  the  world  that  have  noticed  a decline 
in  their  math  and  science  scores  over  the 
last  couple  of  decades.  Almost  30  percent 
of  students  in  their  first  year  of  college  are 
forced  to  take  remedial  science  and  math 
classes  because  they  are  not  prepared  to 
take  college-level  courses[1].  The  United 
States  has  addressed  this  by  creating  “A 
National  Action  Plan  for  Addressing  the 
Crif/ca/  Needs  of  the  U.S.  Science, 
Technology,  Engineering,  and  Mathematics 
Education  System.” [1]  Other  countries  have 
taken  similar  steps  in  addressing  this 
decline. 

"The  United  States  possesses  the  most 
innovative,  technologically  capable 
economy  in  the  world,  and  yet  its 
science,  technology,  engineering,  and 
mathematics  (STEM)  education  system 
is  failing  to  ensure  that  all  American 
students  receive  the  skills  and 
knowledge  required  for  success  in  the 
21st  century  workforce....  To  succeed  in 
this  new  information-based  and  highly 


technological  society,  aii  students  need 
to  develop  their  capabilities  in  STEM  to 
levels  much  beyond  what  was 
considered  acceptable  in  the  past.... 
Strengthening  STEM  education  across 
the  nation  is  critical  to  maintaining  a high 
quality  of  life  for  our  citizens  and 
ensuring  that  Americans  remain 
competitive  in  international  science  and 
technology.  Public  awareness  and  action 
are  critical  to  addressing  this  crisis."  [1] 

To  respond  to  this  many  schools  have  taken 
a look  at  their  curriculum  and  teaching 
methods  and  modified  their  methods  to 
focus  on  test  questions  more  than  the 
course  material  [2].  This  is  not  a long-term 
solution;  it  just  addresses  the  symptoms 
(the  test  scores)  rather  than  the  cause  (the 
lack  of  student  understanding). 

2.1  Why  is  this  happening? 

Over  the  last  fifty  years  or  so  what  has 
changed?  Why  is  this  becoming  a problem? 
The  reasons  differ  quite  a bit  for  each  region 
in  the  world.  However,  as  reviewed  below, 
some  reasons  transcend  regional 
differences. 

One  of  these  reasons  is  the  perception  that 
STEM  topics  are  too  hard  and  the  student 
will  fail  at  them.  The  test  is  not  the  problem, 
but  the  subject.  Before  even  getting  started, 
many  students  have  already  given  up.  Math 
has  the  biggest  stigma;  most  students  start 
their  struggle  here  and  this  leads  to  a similar 
problem  in  their  future  science,  engineering 
and  technology  study  since  they  are  based 
on  math. 
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Another  big  reason  is  the  real-world 
examples  used  in  current  textbooks  and 
ciasses.  The  exampies  used  are  not 
necessarily  real-world  from  the  student’s 
point  of  view  and  can  be  hard  to 
understand. 

Students  are  affected  greatly  by  peer 
pressure,  and  the  majority  of  the  STEM 
curriculum  falls  in  an  unpopular  category. 
This  gives  students  even  less  motivation  to 
want  to  succeed  in  these  areas  of  their 
studies. 

School  systems  everywhere  are  being 
strained  with  smaller  budgets  and  more 
students,  making  it  difficult  for  students  to 
get  the  help  they  need.  Students  learning 
the  material  don’t  get  enough  help  figuring 
out  how  to  solve  the  problems  that  are  given 
to  them  and  how  to  estimate  possible 
solutions.  In  the  end,  many  students  Just 
turn  in  a number  hoping  that  it’s  correct. 

These  are  Just  a few  of  the  bigger  reasons 
for  the  decline  in  STEM  education  and  a 
need  for  improvement.  This  does  not 
account  for  many  other  reasons  that  might 
be  political,  cultural,  or  other  in  nature. 

2.2  Overcoming  the  Problem 

These  problems  of  getting  the  next 
generation  to  like  and  be  good  at  STEM  and 
see  it  as  a viable  possibility  for  their  future 
careers  can  be  overcome.  We  have  noticed 
while  teaching  game  and  simulation 
development  for  over  ten  years  that  many  of 
these  problems  can  be  overcome  fairly 
easily  with  the  right  tools  and  activities.  We 
take  a look  at  alternative  solutions  that  get 
students  to  learn  while  having  fun  and 
making  their  own  games. 

2.2.1  The  Subject  of  Game 
Development 

Game  development  has  a great  appeal  to 
the  majority  of  the  younger  students,  and 
this  by  itself  is  enough  to  increase  students’ 
confidence  in  themselves.  They  consider 
themselves  experts  at  games  from  day  one. 
Younger  students  also  have  a key 
knowledge  of  how  a video  game  works,  and 


they  know  what  needs  to  be  on  the  screen 
to  verify  that  everything  is  working  properly, 
and  can  check  their  own  solutions.  The 
average  student  in  the  United  States  has 
over  five  thousand  hours  of  gameplay 
experience  [3]  giving  them  confidence  when 
checking  their  solutions.  All  this  time  and 
experience  with  video  game  start  them  off 
with  a mental  preparation  for  success  and 
very  often  a passion  for  video  games.  Many 
of  them  have  also  played  video  games  that 
they  modified  or  created  new  levels  for  with 
the  games  built-in  editors.  These  built-in 
editors,  however,  are  not  very  good  for 
teaching  because  they  have  been  simplified 
to  let  anyone  create  their  own  levels  easily 
and  often  lack  documentation. 

The  subject  of  video  games  also  creates  a 
peer-pressure  environment  that  motivates 
students  to  do  better.  They  strive  to  create 
games  that  will  impress  their  peers  in  the 
classroom  and  friends  outside  of  class. 

Often  students  will  put  in  more  time  in 
making  and  polishing  a game  or  a 
simulation  than  they  would  anything  else  in 
their  studies. 

We  have  seen  at  ECPI  that  the  game  and 
simulation  courses  have  an  attendance  rate 
ten  percent  higher  than  other  technical 
classes  in  their  major.  This  also  improves 
their  pass  rate  in  these  courses  by  a similar 
percentage.  We  have  also  seen  this  same 
trend  with  camps  that  we  have  offered  to 
high  school  students  compared  to  camps  in 
other  fields  [4], 

Other  institutions  such  as  Purdue 
University,  MIT,  NASA  and  more  have 
created  games  or  tools  to  appeal  to  the 
younger  generations.  They  have  created 
these  games  and  tools  to  bring  new  blood  to 
their  respective  industries.  Robert  Morris 
University  did  a survey  of  fourth  graders 
before  and  after  a STEP  camp.  The 
students  came  to  the  college  four  hours 
every  week  for  eighteen  weeks,  with  their 
time  split  equally  between  technology  and 
sciences  [5].  For  young  students  attending 
camps  at  ECPI,  we  found  that  36%  of  the 
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students  surveyed  before  the  camp  were 
interested  in  a career  that  involved  STEM 
compared  to  68%  after  the  camp.  This 
significant  change  has  to  be  contributed 
largely  to  the  students  starting  to  like  the 
subject  especially  at  such  an 
impressionable  age. 

2.2.2  Math  the  Building  Block 

Math,  being  the  building  block  for  science, 
engineering  and  technology,  needs  to  be 
addressed  first.  Game  development  can  be 
used  to  teach  math  using  simple  classic 
games  such  as  Breakout,  Space  Invaders, 
and  Asteroids.  The  math,  logic,  and  rules  in 
these  games  are  simple  making  them  the 
ideal  choice.  We  also  start  by  using  a game 
engine  that  has  a graphical  programming 
language  requiring  no  programming 
experience,  such  as  Gamemaker  or 
Gamesalad.  All  this  allows  us  to  focus  on 
the  logic  and  basic  math. 

Recreating  and  expanding  many  of  the  early 
games  can  be  used  to  introduce  the 
following  math  concepts; 

• Unit  Conversion  - by  converting  screen 
coordinates  to  world  coordinates  in 
game 

• Cartesian  Coordinates  - Using  basic 
movement  to  calculate  character 
positions 

• Functions  - can  be  taught  by  using  the 
built-in  functions  and  creating  new  one 
for  power-ups  in  games 

• Sine,  cosine  - for  player  rotation 

• Statistics  and  probability  - for  creating 
random  distributions  for  NPC,  power 
ups,  and  dealing  damage 

These  touch  Just  some  of  the  simpler 
concepts;  many  more  math  concepts  can  be 
introduced  in  similar  ways.  Some  of  the 
more  advanced  math,  such  as  matrix 
manipulation  and  the  complex  number 
system,  may  require  a more  advanced 


game  engine  such  as  Unity  or  Torque  with 
some  C++  like  programming. 

Unlike  many  word  problems  in  math  where 
students  struggle  to  understand  the 
problem,  the  calculations,  and  the  process 
for  checking  a solution,  the  task  is  clear  and 
the  student  can  focus  on  solving  the 
problem.  They  can  check  their  solution  by 
playing  the  game  and  immediately  know  if 
there  is  a problem  with  their  solution. 

Having  these  visual  cues  from  the  game, 
they  can  see  what  is  wrong  and  can  usually 
solve  the  problem  by  themselves  most  of 
the  time,  along  the  way  learning  the 
concepts  in  greater  depth. 


In  Fig  1 you  see  one  of  the  games  we  used 
to  teach  the  Cartesian  coordinate  system  to 
fourth  and  fifth  graders  based  on  the  classic 
Breakout  game.  The  students  loved  the 
lesson  and  did  not  even  realize  that  this 
involved  math.  The  students  set  up  the  ball 
movement,  bat,  and  blocks  in  the  game 
using  basic  GUI  commands  and  using 
screen  coordinates  that  are  very  similar  to 


Figure  1 

the  Cartesian  coordinates.  Math  is  where 
we  had  the  greatest  improvement  for  our 
third  and  fourth  graders  from  our  STEP 
program  in  comparison  to  their  peers  from 
over  two  hundred  students  surveyed  [5]. 
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2.2.3  Science 

Game  development  requires  the  direct  use 
of  physics,  while  the  other  sciences  can  be 
included  at  best  indirectly.  Most  game 
engines  have  physics  support  built  in,  such 
as  PhysX®  and  we  take  advantage  of  this  in 
our  lessons. 

The  easiest  way  to  start  teaching  science 
with  games  is  with  Newtonian  physics  and 
remaking  games  such  as  Super  Mario 
Brothers,  Gran  Turismo,  Donkey  Kong,  and 
Lunar  Lander.  These  games  can  be  used  to 
introduce  many  concepts  and  formulas  such 
as  gravity,  acceleration,  momentum,  friction, 
mass,  force,  terminal  velocity,  torque,  and 
levers.  The  concepts  can  be  used  to 
compare  and  contrast  using  actual  physics 
versus  guesstimating,  allowing  the  students 
to  see  how  the  difference  affects  the 
behavior  of  the  game.  This  means  more  to 
the  student  than  “real-world  examples”  or 
just  reading  a problem  and  finding  a number 
that  is  the  solution.  When  customizing  these 
games,  students  will  be  able  to  manipulate 
variables  such  as  force,  mass,  acceleration 
and  time,  and  get  a better  understanding  of 
these  concepts.  Using  the  game  engine  will 
allow  them  to  experiment  with  variables  that 
would  not  be  safe  or  even  possible  in  a lab 
environment. 

Other  sciences  such  as  biology  can  also  be 
taught  by  having  the  students  develop 
serious  games  to  teach  the  concepts.  These 
games  can  range  from  simple  games  based 
on  the  action  genre  that  students  create,  to 
MMO/Adventure  type  games  that  will 
incorporate  the  science  as  part  of  the 
gameplay  that  students  design.  Here  are  a 
few  games  that  have  used  these 
approaches  that  students  can  create: 

• "Moon  Base  Alpha”  - NASA’s  MMO 

• "Spore”  - Electronic  arts 

• “Math  Blaster"  - Davidson  (Nintendo 
and  Sega  later) 

• “The  Incredible  Machine”  - Sierra 


• “Marble  Madness”  - Atari 

• "Critical  Mass”  - Purdue  University 

Creating  some  of  these  games  and 
simulations  will  require  more  advanced 
features  in  the  tools  than  the  most  basic 
game  engines  can  offer,  requiring  a more 
advanced  tool  such  as  Unity. 


Figure  2 


Fig.  2 was  created  by  ECPI  students.  It 
shows  the  USS  Monitor  floating  in  rough 
seas.  Here  not  only  did  the  students  have  to 
create  the  ship  itself,  but  had  to  work  out  the 
buoyancy  and  placing  the  mass  of  the  ship, 
turret,  and  engine  in  the  correct  locations. 
After  putting  ail  this  together,  they  saw  that 
the  stern  of  the  ship  sits  lower  than  the  bow. 
They  thought  that  they  had  made  an  error 
but  after  looking  at  historical  pictures,  they 
saw  this  was  correct. 

2.2.4  Engineering 

Engineering  is  a great  place  to  apply  the 
math  and  physics  that  the  students  learned 
using  game  development  tools  for  the  Initial 
lessons.  Game  development  tools  have 
support  for  many  of  the  principles  needed 
for  engineering.  This  allows  students  to 
create  games  and/or  simulations  for  building 
bridges,  ships,  car,  ecosystems,  and  more. 
We  have  had  our  students  create  all  of 
these  as  simulations  or  games. 

However,  only  some  of  the  examples  can 
be  done  with  basic  tools,  creating  most  of 
these  engineering  examples  require  more 
advanced  tools  such  as  Unity,  Torque,  or 
Alice.  The  ones  that  can  be  implemented 
easily  are: 
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• A 2D  bridge  building  game  where 
students  have  a challenge  of  building  a 
bridge  that  spans  a river  or  valley. 

• An  ecosystem  simulation  that  you  set  up 
for  a pond  where  the  user  has  to 
balance  the  system  so  that  nothing  dies 
out. 

• Simulating  traffic  at  an  intersection, 
stretch  of  highway,  or  parking  lot  to  see 
what  bottlenecks  are  there  and  look  at 
possible  fixes. 


Figure  3 


Here  you  see  in  Fig  3.  a pond  simulation 
that  that  our  students  have  completed. 
They  have  to  do  the  research  and  get  help 
with  biology,  not  only  programming.  While 
creating  it,  they  learn  about  ecosystems, 
and  after  completing  the  pond,  they  can 
experiment  with  trying  to  balance  it  or 
seeing  how  easily,  an  ecosystem  can  be 
broken. 


Figure  4 


2.2.5  Technology 

Using  game  development  tools  for  teaching 
the  science,  engineering  and  math,  teaches 
technology  at  the  same  time  by  using  game 
development  tools.  This  helps  build 
computer  literacy  and  logic  skills  with  every 
lesson  that  the  students  do. 

The  computer  literacy  skills  that  the 
students  learn  in  the  lessons  include:  file 
management,  file  types,  compiling  projects, 
image  editing,  web  research  and  more. 
Whatever  field  the  student  eventually 
enters,  these  skills  will  be  required.  While 
learning  the  game  and  simulation  tools,  the 
students  learn  logic,  design  and 
programming.  These  tools  introduce 
students  to  programming  concepts  such  as 
functions,  variables,  inheritance,  arrays,  and 
conditional  statements. 

3.0  RESULTS 

With  STEM  subjects  not  being  the  most 
glamorous  in  students’  eyes,  games  and 
game  development  help  address  this.  The 
first  question  to  be  answered  is  “Do  these 
activities  change  students  outlook  on  STEM 
fields?”  We  see  this  happening  with  all  our 
students,  but  the  ones  that  are  influenced 
the  most  are  the  fourth  graders.  This  is 
supported  by  before  and  after  surveys  of 
fourth  and  fifth  graders  that  show  an 
increase  of  32%  in  their  views  of  science 
and  technology  as  a career  and  38% 
increase  in  liking  science  and  technology. 
These  are  our  most  easily  influenced  group. 

The  high  school  students  we  had  in  our 
game  development  camps  were  there 
voluntarily.  Camps  offering  game 
development  were  filled  to  capacity  with  a 
waiting  list  while  camps  on  other  STEM 
topics  had  seats  open,  leading  to  the 
conclusion  that  the  addition  of  game 
development  into  the  curriculum  increased 
student  participation.  Looking  from  this 
perspective  game  development  works  to 
attract  high  schools  students  to  STEM. 
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The  attendance  for  ECPI  game  and 
simulation  classes  is  just  over  10%  higher 
than  in  other  computer  science  classes  for 
the  same  students,  and  their  tardiness  is 
reduced  as  well.  All  the  classes  compared 
started  and  ended  at  the  same  time. 

4.0  CONCLUSION 

The  biggest  problem  is  getting  faculty  that 
can  teach  with  the  game  development  tools, 
especially  for  the  advanced  lessons  that 
require  the  higher-level  game  and 
simulation  tools.  Game  development  and 
games  will  inspire  the  next  generation  to  like 
technology.  It  will  make  them  better  at  the 
sciences,  engineering  and  math.  The  exact 
size  of  the  benefit  is  still  unknown.  However, 
additional  studies  need  to  be  done  to 
measure  the  long-term  effectiveness  of  the 
initiatives. 
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Abstract.  How  to  integrate  simuiation-based  engineering  and  science  (SBES)  into  the  science  curriculum  smoothly  is  a challenging 
question.  For  the  importance  of  SBES  to  be  appreciated,  the  core  value  of  simulations — that  they  help  people  understand  natural 
phenomena  and  solve  engineering  problems— must  be  taught.  A strategy  to  achieve  this  goal  is  to  introduce  computational  experi- 
ments to  the  science  curriculum  to  replace  or  supplement  textbook  Illustrations  and  exercises  and  to  complement  or  frame  hands-on 
or  wet  lab  experiments.  In  this  way,  students  will  have  an  opportunity  to  learn  about  SBES  without  compromising  other  learning 
goals  required  by  the  standards  and  teachers  will  welcome  these  tools  as  they  strengthen  what  they  are  already  teaching.  This  pa- 
per demonstrates  this  idea  using  a number  of  examples  in  physics,  chemistry,  and  engineering.  These  exemplary  computational 
experiments  show  that  it  Is  possible  to  create  a curriculum  that  is  both  deeper  and  wider. 


1.0  INTRODUCTION 

Before  starting  this  paper,  I feel  obliged  to 
explain  how  I use  the  terms  “modeling”  and 
“simulation.”  When  studying  the  function  or 
behavior  of  a system  that  involves  time- 
varying  properties,  I prefer  to  use  “simula- 
tions.” If  only  the  structure  or  configuration 
of  a system  is  concerned,  I prefer  using 
“models.”  For  example,  a protein  model  de- 
scribes a protein  structure  in  a stable  state, 
whereas  a protein  dynamics  simulation  de- 
scribes a protein  folding  or  binding  process. 
But  this  distinction  is  just  personal.  For  most 
of  the  part,  the  words  “modeling”  and  “simu- 
lation” or  “model”  and  “simulation”  can  be 
used  quite  interchangeably. 

Simulation-based  engineering  and  science 
(SBES)  is  defined  as  the  discipline  that  pro- 
vides the  scientific  and  mathematical  basis 
for  simulating  natural  and  engineered  sys- 
tems [1].  SBES  is  increasingly  important  in 
accelerating  research  and  development  be- 
cause of  the  analytical  power  and  cost  ef- 
fectiveness of  computer  simulation.  Ad- 
vanced simulation  tools  based  on  solving 
fundamental  equations  in  physics  are  rou- 
tinely used  to  understand  natural  phenome- 
na and  solve  engineering  problems.  SBES 
is  an  interdisciplinary  subject  indispensable 
to  the  nation’s  continued  leadership  in 
science  and  technology  [2]. 

SBES,  however,  has  virtually  no  place  in  the 
current  science  and  engineering  curriculum 
frameworks  at  the  secondary  level.  Despite 


the  fact  that  modern  simulation  tools  can 
run  on  an  average  computer  and  be  used 
just  like  an  ordinary  application,  it  is  still 
commonly  thought  that  SBES  mandates 
advanced  mathematics  and  science,  uses 
abstruse  jargon,  requires  monster  super- 
computers, works  only  through  the  esoteric 
command  line,  and  cannot  be  possibly 
taught  or  used  at  the  secondary  level.  As  a 
result,  most  students  are  not  informed  of  the 
modern  concepts  of  SBES  and  are  deprived 
of  an  opportunity  to  develop  an  interest  in  it 
earlier  in  their  education.  The  consequence 
of  this  deficiency  may  have  contributed  to 
the  erosion  of  the  nation’s  leadership  in 
SBES  and  engineering  and  science  in  gen- 
eral [1]. 

One  of  the  purposes  of  this  multi- 
disciplinary conference  is  to  create  a dialo- 
gue that  will  lead  to  the  development  of  a 
national  plan  for  integrating  SBES  into  K-20 
education,  a literacy  framework  for  SBES, 
and  a research  agenda. 

How  to  integrate  SBES  into  the  secondary 
curriculum  is  a tricky,  open  question.  The 
U.S.  science  curriculum  is  often  criticized  to 
be  “a  mile  wide  and  an  inch  deep.”  [3]  In- 
corporating SBES  into  the  existing  curricu- 
lum must  reconcile  with  the  growing  national 
consensus  around  the  need  for  “fewer, 
higher,  clearer”  education  standards  [4]. 
The  majority  of  science  educators  will  need 
to  be  convinced  that  the  integration  of  SBES 
into  their  curricula  will  be  realistic,  construc- 
tive, and  helpful. 
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This  paper  suggests  an  implementable  inte- 
gration strategy  that  uses  the  products  of 
SEES  to  help  teachers  achieve  their  goals 
in  deepening  students'  conceptual  under- 
standing, as  set  by  the  new  National 
Science  Education  Standards  [4],  and  in 
doing  so,  conveys  the  core  values  of  SEES 
to  both  students  and  teachers.  To  be  prac- 
tical, I will  demonstrate  how  this  strategy 
may  work  using  a few  concrete  examples  in 
physical  science  and  engineering.  Each  of 
these  examples  shows  how  students  can 
use  a visual,  interactive,  and  constructive 
simulation  tool  to  conduct  and  design  a se- 
quence of  computational  experiments  to 
explore  a broad  set  of  concepts  In  great 
depth.  These  examples  are  based  on  my 
work  on  creating  simulation  tools  for  science 
education  using  research-grade  computa- 
tional methods  for  solving  fundamental 
physical  laws.  Resorting  to  first  principles  in 
physics  to  build  educational  tools  may  be 
considered  as  overkill  by  some  educational 
developers,  but  it  is  essential  to  bringing 
learning  experience  with  authentic  science 
to  the  classroom.  Of  more  importance,  it 
opens  up  new,  profound  opportunities  for 
deeper  learning  that  will  not  exist  otherwise, 
as  the  examples  will  show.  Having  a strong 
root  in  SEES,  the  examples  may  be  used  as 
stimulating  introductions  of  the  theory  and 
practice  of  the  corresponding  computational 
methods  behind  the  scene.  By  employing 
well-established  pedagogical  principles 
such  as  design-based  learning  [5]  or  con- 
structionism [6]  in  the  curriculum  to  encour- 
age students  to  create  simulations  to  an- 
swer questions,  solve  problems,  or  design 
systems,  learning  secondary  science  and 
learning  SEES  can  have  a significant  over- 
lap and  a mutual  enhancement  can  conse- 
quently occur. 

2.0  WHAT  IS  A COMPUTATIONAL 
EXPERIMENT? 

A computational  experiment  is  a computer 
simulation  of  a real  experiment  or  a com- 
puter implementation  of  a thought  experi- 
ment. Computational  experiments  comple- 
ment analytical  theories  and  real  experi- 
ments by  providing  a tool  for  explaining 


Figure  1:  A computational  experiment  for 
studying  states  of  matter.  The  images 
show  if  and  how  molecules  of  a gas,  liq- 
uid, and  solid  students  place  into  a con- 
tainer will  fill  the  space  and  if  they  can  be 
compressed.  Gas:  a,  b,  c;  liquid:  d,  e,  f; 
solid:  g,  h,  I. 

what  was  observed  and  predicting  what  will 
happen.  With  all  these  explanatory  and  pre- 
dictive power,  computational  models,  the 
machinery  of  computational  experiments, 
have  become  a key  element  of  science  [7]. 
They  are  now  also  considered  a pillar  of 
science  education — in  parallel  to  mental  and 
physical  models — if  appropriate  user  inter- 
faces are  provided  to  make  them  accessible 
to  every  student, 

Figure  1 shows  a computational  experiment 
for  investigating  the  molecular  mechanisms 
underlying  some  key  macroscopic  proper- 
ties of  a gas,  liquid,  and  solid.  Students  can 
add  different  types  of  molecules  into  a con- 
tainer maintained  at  a given  temperature 
(Fig.  1a,  d,  g).  If  the  molecules  form  a gas, 
they  will  quickly  fill  up  the  entire  container 
and  move  rapidly  and  chaotically  (Fig.  1b).  If 
they  form  a liquid,  they  will  fill  up  the  lower 
part  of  the  container  and  move  randomly 
around  each  other  (Fig.  1e).  If  they  form  a 
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solid,  they  will  stay  together  to  form  an  or- 
dered structure  (Fig.  1h)  and  vibrate  around 
some  fixed  positions.  Now  if  we  apply  some 
pressure  through  a piston,  the  gas  will  be 
significantly  compressed  (Fig.  1c),  whereas 
the  liquid  and  solid  can  hardly  be  (Fig.  If,  i). 

As  you  can  see  from  Fig,  1,  the  most  impor- 
tant properties  of  the  three  states  of  matter 
are  all  explained  on  the  molecular  basis  in  a 
single  computational  experiment  using  just 
the  mouse.  All  the  related  concepts  such  as 
the  Kinetic  Molecular  Theory,  diffusion,  lat- 
tice vibration,  and  pressure,  however  dispa- 
rate they  may  be  in  a textbook,  are  linked 
and  unified  in  the  very  same  computational 
model  and  can  be  manifested  or  inquired 
through  the  computational  experiment.  With 
a variety  of  visualization  and  analysis  tools, 
many  more  details  can  be  discovered.  For 
instance,  students  can  select  a molecule 
and  visualize  its  trajectory  to  examine  how  it 
moves  in  different  states.  The  force  vector, 
velocity  vector,  or  kinetic  energy  shading  on 
each  atom  can  be  shown  to  provide  further 
information  about  molecular  collisions. 


Figure  2:  The  effect  of  interatomic  interac- 
tions. The  potential  well  depths  are:  (a)  1 
eV,  (b)  0.5  eV,  and  (c)  0.01  eV,  respective- 
ly. A shape  of  the  interatomic  potential 
function  is  shown  in  (d). 


3.0  GOING  DEEPER,  REACHING 
WIDER 

The  example  shown  in  Fig.1  demonstrates 
how  a computational  experiment  can  trans- 
form the  way  we  teach  states  of  matter.  The 
new  way  is  simple  enough  to  be  applicable 
even  to  lower  grades.  But  the  computational 
model  has  a lot  more  to  offer.  The  example 
shows  only  the  tip  of  the  iceberg.  What  lies 
beneath  is  a great  number  of  opportunities 
to  go  deeper  and  reach  wider. 

The  computational  experiment  shows  how 
matter  in  different  states  behaves  without 
further  explaining  what  makes  them  behave 
so.  Figure  2 shows  the  equilibrium  confor- 
mations of  a molecular  system  correspond- 
ing to  three  different  strengths  of  interatomic 
interactions,  which  students  can  adjust 
through  a graphical  user  interface  shown  in 
Fig,  2d.  This  step  takes  students  down  to 
the  level  of  studying  the  states  of  matter  in 
terms  of  interaction  and  energy.  It  offers  an 


intuitive  explanation  of  the  idea  that  stronger 
attractions  result  in  stronger  materials. 

With  a few  more  mouse  clicks,  many  other 
concepts  can  be  studied.  The  material 
strength  can  be  tested  by  increasing  the 
pressure  on  the  piston  and  observe  how  the 
solid  is  deformed,  which  explains  plasticity. 
By  increasing  the  temperature,  the  solid  will 
become  more  ductile,  which  then  explains 
the  business  of  a blacksmith.  When  the 
temperature  continues  to  rise,  the  solid  will 
melt  down  into  a liquid  to  fill  the  container, 
showing  a phase  change.  If  the  temperature 
keeps  rising,  the  liquid  will  turn  into  a gas 
and  start  to  push  the  piston  up,  causing  a 
dramatic  volume  expansion.  This  explains 
how  heat  can  do  work.  Gas  laws  can  also 
be  studied.  Students  can  discover  that  un- 
der the  same  pressure,  higher  temperature 
wilt  result  in  greater  volume,  and  under  the 
same  temperature,  higher  pressure  will  re- 
sult in  smaller  volume.  Students  can  even 
ask  questions  not  covered  in  most  curricula 
about  gas  laws.  For  example,  the  mass  of 
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the  molecules  can  be  changed  to  see  if  it 
affects  the  equilibrium  volume  of  the  gas 
under  the  same  pressure  and  temperature. 
Visually,  it  seems  that  the  mass  of  mole- 
cules should  affect  the  volume,  as  more 
massive  molecules  appear  to  move  more 
slowly.  The  frequency  at  which  they  bump 
into  the  piston  is,  therefore,  lower.  But  a 
computational  experiment  simply  shows  that 
molecular  mass  has  no  effect,  just  as  sug- 
gested by  the  Ideal  Gas  Law:  PV=nRT.  This 
is  not  easy  to  figure  out  just  by  thinking. 
Even  if  one  can  reason  correctly  that  a more 
massive  molecule  will  deliver  a greater  im- 
pact when  it  collides  with  the  piston,  some 
mathematical  work  is  still  needed  to  prove 
that  the  two  effects  cancel  out  exactly. 
Where  the  mathematical  skill  prevents  the 
majority  of  students  from  investigating  fur- 
ther, the  computational  experiment  helps 
them  move  forward.  Beyond  investigating 
the  effect  of  mass,  students  can  even  adjust 
the  atomic  radius  and  the  van  der  Waal  at- 
traction to  explore  how  the  equation  of  state 
deviates  from  the  Ideal  Gas  Las.  A compar- 
ison of  two  gases  that  differ  only  in  atomic 
radius  or  van  der  Waals  potential  energy 
shows  the  effect  of  excluded  volume  or  in- 
termolecular  attraction. 

It  is  probably  not  an  exaggeration  to  assert 
that  the  possibility  of  inquiry  is  only  limited 
by  the  imagination  of  the  experimenter.  The 
breadth  and  depth  of  the  science  embodied 
in  this  computational  experiment,  along  with 
the  ease  of  inquiry  afforded  by  a graphical 
user  interface,  suggest  the  feasibility  of 
creating  a curriculum  that  is  both  wide  and 
deep  using  this  type  of  simulations. 

This  prospect  would  not  have  been  possible 
without  using  authentic  science  to  build  the 
educational  tool.  The  scientific  power  dem- 
onstrated above  originates  from  the  applica- 
tion of  the  molecular  dynamics  method  [8], 
which  is  an  important  tool  in  SBES  for 
studying  nanoscale  science  and  engineer- 
ing [9].  The  computational  experiment  de- 
scribed above  was  designed  and  conducted 
using  the  Molecular  Workbench  software. 


which  has  a classical  molecular  dynamics 
tool  tailor-made  for  science  education  [10]. 

I hope  you  are  reasonably  inspired  by  this 
introductory  example.  In  the  following  sec- 
tion, I will  discuss  more  about  the  need  to 
use  true  science  to  build  educational  tools 
and  the  implication  of  this  to  SBES  educa- 
tion. Then  I will  show  more  examples  in 
other  disciplines  in  later  sections. 

4.0  WHY  USE  ROCKET  SCIENCE  TO 
BUILD  EDUCATIONAL  TOOLS? 

The  educational  software  market  is  largely 
dominated  by  cartoon  movies,  animations, 
and  games.  Most  of  these  media  were 
usually  produced  with  multimedia  effects  as 
the  paramount  design  goal  in  the  develop- 
ers’ minds.  Although  many  claim  to  offer 
computer  models,  most  are  insufficiently 
intelligent  to  have  the  desired  predictive 
power.  For  example,  an  animation  that  the 
user  cannot  change  has  only  illustrative 
power  but  no  predictive  power  at  all.  An  in- 
teractive model  or  game  designed  to  have  a 
limited  number  of  outcomes  scripted  by  the 
developer  can  explain  the  preset  causality 
but  nothing  beyond.  In  making  the  rules  for 
determining  the  outcomes,  many  develop- 
ers seldom  perceive  a need  to  exploit  the 
“rocket  science” — advanced  mathematics 
and  computation  based  on  first  principles  in 
science.  In  the  following,  I will  explain  why 
there  is  such  a need. 

A first  principle  is  a foundational  scientific 
law  from  which  many  phenomena  can  be 
explained  and  many  propositions  can  be 
derived.  For  example,  Newton’s  equation  of 
motion  is  the  first  principle  in  classic  me- 
chanics— everything  in  the  domain  of  clas- 
sic mechanics  can  be  explained  by  solving  it 
analytically  or  numerically.  The  classical 
molecular  dynamics  method  that  powers  the 
computational  experiment  shown  in  the  pre- 
vious sections  is  based  on  solving  Newton’s 
equation  of  motion  for  a system  of  interact- 
ing particles  that  model  atoms  and  mole- 
cules. It  is  responsible  for  all  the  simulations 
in  the  computational  experiment  that  explain 
the  myriad  of  concepts.  There  is  no  need  for 
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students  to  program  all  those — everything 
just  emerges  from  the  number  crunching 
done  by  the  computational  engine  according 
to  the  ruling  equations. 

The  molecular  dynamics  method  was,  how- 
ever, not  originally  intended  for  education.  It 
was  developed  to  help  scientists  and  engi- 
neers explore  nanoscience  and  engineer 
nanosystems  [11].  The  generations  of  com- 
putational scientists  who  contributed  to  the 
theory  and  practice  of  the  method  presuma- 
bly did  not  anticipate  that  one  day  the  me- 
thod would  find  its  place  in  thousands  of 
schools  all  over  the  world.  But  this  should 
not  be  surprising  at  all.  In  fact,  science  edu- 
cation and  scientific  research  share  a com- 
mon goal:  to  understand  how  the  world 
works.  It  is,  therefore,  no  wonder  that  a re- 
search tool  can  be  successfully  converted 
into  a learning  tool. 

Perhaps  the  single  most  important  reason 
for  using  first  principles  to  build  educational 
tools  is  that  all  the  power  of  explanation, 
prediction,  and  creation  embodied  in  them 
will  then  be  given  to  every  student.  What 
else  is  more  important  in  education  than 
passing  students  the  greatest  power  and 
deepest  wisdom  brought  to  us  by  the  most 
brilliant  figures  in  the  history  of  science  and 
engineering?  Now  that  the  information  tech- 
nology has  empowered  us  to  deliver  them 
through  computing,  an  unprecedented  op- 
portunity to  revitalize  science  and  engineer- 
ing education  using  this  enabling  technology 
is  right  upon  us. 

Unfortunately,  this  opportunity  is  often  unde- 
rappreciated in  the  educational  world.  Using 
first  principles  to  build  interactive  media  is 
not  part  of  the  design  guidelines  for  the 
mainstream.  There  are  many  more  domains 
of  science  and  engineering  where  the  curri- 
culum needs  to  be  transformed  in  a way 
similar  to  what  was  described  in  the  pre- 
vious sections.  Enormous  volumes  of  litera- 
ture have  existed  for  how  to  simulate  real 
world  problems  by  numerically  solving  fun- 
damental equations  such  as  the  Navier- 
Stokes  equation  for  fluid  dynamics  and  the 


Maxwell  equations  for  electrodynamics  and 
photonics.  Sadly,  there  has  been  little  in- 
vestment and  interest  in  making  those  po- 
werful methods  usable  by  students  and  the 
public  at  large. 

5.0  NEW  STANDARDS,  NEW  OP- 
PORTUNTIES 

There  is  now  a chance  for  SBES  to  prove  its 
value  in  education  at  a large  scale.  The  new 
National  Science  Education  Standards  has 
put  forward  a “more  coherent  vision”  of 
science  education  [4].  The  framework  calls 
for  educators  to  focus  on  a limited  number 
of  core  ideas  and  give  time  for  students  to 
engage  in  scientific  investigations  and 
achieve  depth  of  understanding.  It  empha- 
sizes that  learning  about  science  and  engi- 
neering involves  the  integration  of  both  con- 
tent knowledge  and  the  practices  needed  to 
engage  in  scientific  inquiry  and  engineering 
design.  It  recognizes  learning  as  an  ongoing 
developmental  progression.  Exactly  how 
this  vision  will  turn  into  actions  is  a critical 
question.  Given  the  fact  that  the  results  from 
the  1996  Standards  have  been  disappoint- 
ing [12],  the  development  of  creative  ideas 
to  implement  the  new  framework  in  the  cur- 
riculum will  be  more  important  than  ever. 

The  recommendation  of  learning  from  core 
ideas  is  not  an  overstatement.  Richard 
Feynman  once  noted:  “I  am  inspired  by  the 
biological  phenomena  in  which  chemical 
forces  are  used  in  repetitious  fashion  to 
produce  all  kinds  of  weird  effects  (one  of 
which  is  the  author).”  Indeed,  the  unity  of 
science — that  everything  can  be  derived 
from  some  basic  rules  however  their  ap- 
pearances and  representations  may  differ — 
is  probably  the  most  profound  nature  of 
science.  For  students  to  achieve  deeper 
learning,  the  curriculum  must  be  structured 
to  reflect  this  nature.  Learning  should  focus 
on  the  basic  rules  as  suggested  by  the  prin- 
ciple of  Occam’s  razor  and  science  should 
be  taught  as  a way  of  thinking  based  on 
them  rather  than  a large  collection  of  facts. 

The  idea  that  complexity  arises  from  unity  is 
the  holy  grail  of  SBES,  too.  A simulation 
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program  uses  the  same  code  in  repetitious 
fashion  to  produce  all  kinds  of  results  to  ex- 
plain the  weird  effects  observed  in  the  real 
world.  The  parallelism  between  the  inner 
workings  of  a simulation  and  the  conceptual 
structure  of  knowledge  it  simulates  makes  it 
an  ideal  cognitive  tool.  Being  the  scientific 
discipline  about  simulations,  SBES  can  be  a 
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Figure  3:  A computational  experiment  for 
studying  the  inverted  single  (a)  and 
double  (b)  pendulums  on  an  oscillatory 


cornerstone  for  building  the  technological 
foundation  of  the  new  science  framework. 

Although  the  framework  literally  stresses  the 
importance  of  simulations  as  a creative  en- 
gine that  drives  the  scientific  and  engineer- 
ing enterprise,  simulations  are  considered 
more  as  individual  expressions  of  concepts 
than  as  possible  systematic  solutions  to 
realize  the  vision.  In  fact,  a simulation  tool 
can  be  used  not  only  as  an  inquiry  tool  to 
teach  known  facts  as  shown  by  the  example 
of  the  states  of  matter,  but  also  as  a re- 
search tool  to  explore  the  unknowns.  The 
latter  provides  opportunities  to  teach  stu- 
dents to  think  and  practice  like  scientists 
and  engineers,  a wish  reiterated  in  the  new 
standards.  A simulation  tool  constitutes  a 
computational  laboratory  in  which  students 
will  ask  questions,  identifies  problems,  find 
solutions,  and  analyze  results.  The  following 
computational  experiment  about  the  in- 
verted pendulum,  a classic  problem  in  dy- 
namics and  control  theory,  shows  an  exam- 
ple of  how  this  may  work  for  students. 

A pendulum  is  what  every  student  learns  in 
physics.  An  inverted  pendulum  is  what  we 
get  when  we  turn  it  upside  down  after  it  has 
stopped  swinging.  We  know  this  upright  po- 
sition will  not  be  stable.  Any  blow  will  knock 
the  mass  off  from  that  position.  But  there 
are  ways  to  stabilize  it.  One  way  is  to  fix  the 
pivot  on  a base  and  rapidly  oscillate  the 
base  up  and  down.  If  the  oscillation  is  sim- 
ple harmonic  motion,  the  pendulum's  motion 
is  described  by  the  Mathieu  equation,  which 
has  very  complex  solutions  that  tell  how 
high  the  frequency  and  how  large  the  ampli- 
tude should  be  in  order  to  maintain  stability. 

An  interesting  question  that  immediately 
follows  is:  what  will  happen  if  we  invert  the 
double  pendulum?  This  is  a question  that 
would  almost  instantaneously  excite  any 
mathematician  or  physicist  who  knows  the 
importance  of  a double  pendulum  in  nonli- 
near dynamics  and  chaos  theory.  The  study 
of  an  inverted  double  pendulum  may  well 
worth  a Ph.D.  thesis.  But  never  mind  about 
the  intimidation  of  the  mathematical  corn- 
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plexity,  a simulation  tool  can  easily  bring 
students  to  where  the  mathematicians  or 
physicists  stand.  Fig.  3a  shows  a computa- 
tional experiment  designed  to  test  the  stabil- 
ity of  the  inverted  pendulum.  The  amplitude 
and  frequency  of  the  oscillation  of  the  base 
and  the  perturbation  on  the  mass  can  be 
adjusted  to  study  the  dynamic  stability.  If 
students  want  to  test  an  inverted  double 
pendulum,  they  can  just  append  another 
mass  to  the  mass  as  shown  in  Fig.  3b.  Can 
this  chaotic  system  be  stable  in  the  face  of 
the  butterfly  effect?  In  other  words,  does 
unpredictability  necessarily  imply  instability? 

I will  leave  this  interesting  question  for  you 
to  ponder.  The  central  point  of  this  example 
is  that  it  demonstrates  the  enormous  educa- 
tional value  of  a simulation  tool  in  support- 
ing all  levels  of  scientific  investigations, 
which  in  this  case  range  from  a well-known 
problem  (a  single  pendulum)  to  a less- 
known  problem  (an  inverted  pendulum)  and 
then  to  an  unknown  problem  (an  inverted 
double  pendulum). 

6.0  ENGINEERING  DESIGN 

Engineering  is  considered  an  integral  part  of 
the  new  standards.  Engineering  design  is  a 
creative  and  iterative  process  for  identifying 
and  solving  problems  under  various  con- 
straints. It  is  a core  element  to  engineering 
like  inquiry  to  science. 

Modern  engineering  methodologies  heavily 
involve  SEES.  Computer  simulations  are 
often  used  to  screen  solutions  to  a particular 
problem  or  optimize  a design  before  build- 
ing the  real  system.  One  of  the  most  suc- 
cessful applications  of  SEES  to  solve  engi- 
neering problems  is  computational  fluid  dy- 
namics (CFD), 

Deeply  at  the  core,  CFD  involves  sophisti- 
cated numeric  methods  for  solving  the 
Navier-Stokes  equation  such  as  the  finite 
difference  method  or  the  finite  element  me- 
thod. While  it  may  be  inappropriate  to  teach 
the  nuts  and  bolts  of  these  numerical  me- 
thods at  the  secondary  level,  it  is  desirable 
to  teach  how  engineers  use  these  tools  to 


solve  problems.  This  is  similar  to  teaching 
how  to  use  CAD  tools  to  design  structures 
without  teaching  the  computational  geome- 
try under  the  hood.  In  fact,  simulated  fluid 
flows,  when  visualized,  are  intuitive  enough 
for  students  to  understand.  For  example, 
the  Karman  vortex  street  is  mathematically 
complicated  but  probably  not  incompre- 
hensible as  similar  patterns  are  not  uncom- 


(a) 


Figure  4:  (a)  A computational  experiment 
for  studying  the  Rayleigh-Benard  convec- 
tion pattern  of  a fluid  between  a cold  plate 
and  a hot  plate,  (b)  A computational  expe- 
riment for  studying  solar  heating  of  a 
house  through  a window. 
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mon  in  everyday  life.  The  key  is  to  develop 
the  user  interfaces  that  will  make  these 
tools  visually  and  manually  accessible  to 
students  and,  importantly  to  engineering, 
make  it  possible  for  students  to  design  with 
them. 

Ironically,  while  there  are  numerous  CFD 
tools  developed  for  professionals  to  tackle 
engineering  problems,  little  has  been  done 
to  make  a CFD  tool  for  average  secondary 
students  to  learn  with  the  powerful  method. 
With  all  the  dramatic,  artistic  effects  of  flow 
that  worth  the  volume  of  a book  [13],  CFD 
has  enormous  potential  to  bring  fun  and  en- 
joyable learning  experience  to  the  class- 
room. This  potential  should  not  be  left  un- 
tapped any  longer. 

Figure  4 shows  two  computational  experi- 
ments designed  using  an  educational  CFD 
tool  called  Energy2D  I created  to  move  to- 
wards the  goal  of  providing  a versatile  CFD 
laboratory  for  students.  The  tool  allows  the 
user  to  set  up  a 2D  thermal  system  such  as 
a house  and  run  CFD  simulations  to  assess 
the  energy  flow  within  it.  Ultimately,  this  tool 
will  be  integrated  into  a 3D  environment  for 
students  to  evaluate  and  optimize  their  de- 
signs for  real  world  applications  such  as  a 
green  building,  an  internal  combustion  en- 
gine, or  a cooling  system  for  a CPU.  But 
even  the  2D  simulations  show  the  richness 
of  science  and  engineering  concepts  stu- 
dents can  explore.  For  example,  the  tem- 
perature difference  between  the  hot  plate 
and  the  cold  plate  in  Fig.  4a  can  be  adjusted 
to  test  when  the  convective  pattern  be- 
comes turbulent.  The  angle  of  sunlight  can 
be  changed  to  investigate  solar  heating  of  a 
house  at  different  times  of  the  day,  as 
shown  in  Fig.  4b.  Virtual  thermometers  can 
be  placed  in  the  model  house  to  monitor 
temperature  changes  at  any  locations  to 
check  if  a passive  solar  design  meets  the 
requirements  of  thermal  comfort.  When  the 
house  is  heated  internally,  a virtual  thermos- 
tat can  be  added  to  maintain  the  indoor 
temperature.  Students  can  evaluate  energy 
costs  under  various  conditions  and  con- 
straints. For  example,  if  the  environmental 


temperature  is  one  degree  colder,  how 
much  more  energy  will  be  needed  to  keep 
the  house  as  warm?  If  the  sun  is  shining 
into  the  house  through  a window,  how  much 
energy  can  be  saved?  With  a learning  envi- 
ronment like  this,  many  engineering  prob- 
lems and  design  challenges  can  be  posed 
to  students, 

7.0  DISCUSSIONS 

In  this  section,  I would  like  to  make  a few 
further  suggestions  on  how  to  foster  SBES’s 
role  in  the  upcoming  reform  of  science  and 
engineering  education. 

7.1  Blurring  the  line  between  re- 
search and  education 

Modern  personal  computers  have  become 
fast  enough  to  run  serious  simulations  that 
involve  intense  computation.  The  ubiquity  of 
multicore  processors  in  the  near  future  will 
only  make  computers  even  more  powerful. 
What  can  science  education  benefit  from 
personal  computers  that  rival  supercompu- 
ters only  one  or  two  decades  ago? 

Scientific  simulation  software  programs,  the 
direct  products  of  SEES,  can  capitalize  from 
ubiquitous  multicore  computing  [14],  Power- 
ful simulation  tools  running  on  powerful  mul- 
ticore  computers  have  the  potential  of  be- 
coming one  of  the  most  powerful  scientific 
investigation  tools  for  education,  just  like 
their  supercomputing  counterparts  to  scien- 
tists and  engineers  but  only  much  more  ac- 
cessible. Recent  studies  revealed  that  child- 
ren are  born  investigators,  capable  of  rea- 
soning in  a surprisingly  sophisticated  way 
about  the  natural  world  based  on  direct  ex- 
periences with  the  physical  environment 
[15],  If  easy-to-use  graphical  user  interfac- 
es, or  even  the  more  modern  touch  inter- 
faces, are  provided,  there  is  fundamentally 
nothing  that  can  prevent  them  from  becom- 
ing amateur  scientists  and  engineers.  With 
powerful  simulation  tools  at  students’  finger- 
tips, the  line  between  research  and  educa- 
tion will  be  blurred  and  science  can  then  be 
taught  as  the  way  it  is.  When  the  difference 
between  learning  and  investigation  dimi- 
nishes, the  curriculum  can  become  a fantas- 
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tic  journey  to  discover  the  je\A/eis  of  science 
and  engineering. 

7.2  Learning  from  games  but  not 
counting  on  them 

Outside  education,  game  deveiopers  have 
adopted  first  principies  far  more  quickiy  and 
aptiy.  Games  need  to  have  reaiistic  iook- 
and-feeis  in  order  to  be  competitive  in  the 
market  that  aiways  demands  better  reaiism. 
Major  graphics  iibraries  aiready  provide  ex< 
celient  lighting  functions.  Reaiistic  motions 
and  flows  powered  by  physics  engines  such 
as  Lagoa  Multiphysics  and  Maya  Fluid  Ef- 
fects are  now  avaiiabie  for  animators.  Real- 
time physics  games  such  as  Algodoo  and 
Crayon  Physics  are  making  inroads  into 
dassrooms  at  the  iightning  speed.  There  is 
a lot  to  learn  from  the  success  of  games. 

But  game  developers  are  only  interested  in 
technologies  that  can  entertain  the  player 
and  are  not  necessarily  willing  to  invest  on 
things  like  quantum  mechanics,  genome 
dynamics,  or  climate  modeling.  The  future 
of  science  and  engineering  education  can- 
not rely  on  the  good  will  of  the  game  indus- 
try. It  lies  in  the  hands  of  a strong  alliance 
between  scientists,  engineers,  and  educa- 
tors. The  SBES  community  is  among  the 
foremost  groups  that  can  lead  the  charge 
and  bridge  the  gap. 

7.3  Learning  by  creating  simuiations 

One  of  the  most  important  affordances  of 
open-ended  simulation  tools  such  as  the 
Molecular  Workbench  software  is  the  ability 
for  students  to  create  their  own  simulations. 
Only  through  the  creation  process  can 
learning  be  maximally  deepened  and  per- 
sonalized. This  kind  of  simulation  tools  of- 
fers an  important  method  to  implement  the 
theory  of  constructionism  [6]  or  learning  by 
design  [5]  for  the  scientific  fields  they  grow 
out  from.  A good  user  interface  will  allow 
students  to  design  any  computational  expe- 
riments to  test  their  own  hypotheses.  In  our 
field  tests  with  the  Molecular  Workbench 
software,  we  found  students  became  very 
creative  once  they  were  given  creative 
tools.  We  were  initially  concerned  that  stu- 


dents would  just  copy  each  other’s  design 
or  duplicate  a demo,  thus  invalidating  the 
pedagogy.  But  this  did  not  happen.  On  the 
contrary,  it  turned  out  that  a surprisingly 
high  percentage  of  students  came  up  with 
creative  solutions  that  even  professional 
scientists  had  never  thought  of. 

Learning  science  by  designing  new  simula- 
tions has  a substantial  overlap  with  learning 
the  practice  of  SBES.  Both  aim  at  using  si- 
mulations to  prove  a concept  or  test  a de- 
sign. The  only  difference  is  that  the  mission 
of  the  SBES  professionals  is  to  explore  the 
unknowns  on  behalf  of  the  society  whereas 
students  are  only  exploring  on  their  own  be- 
halves what  is  probably  only  new  to  them- 
selves. But  this  difference  is  not  really  fun- 
damental, except  for  a cutting-edge  re- 
search task  requires  a higher  skill  level  and 
a broader  scope  of  knowledge.  If  an  educa- 
tional tool  employs  true  SBES,  some  of  the 
modeling  skills  and  knowledge  students 
learn  from  creating  their  own  simulations  in 
classrooms  may  end  up  transferring  into 
SBES  literacy  and  skills.  For  instance,  stu- 
dents may  learn  some  basic  data  analysis 
skills  that  are  commonly  needed  to  under- 
stand a simulation  in  both  a research  setting 
and  an  educational  setting. 

Creating  simulations  for  learning  science 
provides  the  necessary  contextual ization  for 
SBES  to  be  adopted  in  the  science  curricu- 
lum, as  well  as  the  driving  force  for  engag- 
ing students  and  teachers  to  pursue  SBES. 
Nothing  is  more  rewarding  than  seeing 
one’s  own  simulations  at  work.  And  nothing 
is  more  satisfying  than  seeing  one’s  own 
students  succeeding  in  doing  impressive 
work.  As  such,  students  are  more  likely  to 
be  motivated  to  learn  more  deeply  and  dig 
under  the  hood  of  SBES  in  order  to  improve 
their  own  simulations.  And  teachers  are 
more  likely  to  adopt  the  tools  if  they  see 
their  potential. 

8.0  CONCLUSIONS 

This  paper  suggests  a strategy  for  integrat- 
ing SBES  into  the  science  curriculum  using 
computational  experiments  as  the  facilita- 
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tors.  How  to  implement  the  strategy  under 
the  conceptual  framework  of  the  new  Na- 
tional Science  Education  Standards  was 
discussed  and  substantiated  by  a number  of 
concrete  examples  in  physical  science  and 
engineering.  It  was  elucidated  that  powerful 
scientific  simulations  can  serve  as  cognitive 
tools  for  learning  science  and  engineering 
more  profoundly.  An  important  outcome  of 
adopting  computational  experiments  in  the 
science  curriculum  will  be  that  they  will  also 
provide  pathways  to  teach  the  principles 
and  practices  of  SBES. 
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6.4  Digi  Island:  A Serious  Game  for  Teaching  and  Learning  Digital  Circuit 
Optimization 
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Abstract.  Karnaugh  maps,  also  known  as  K-maps,  are  a tool  used  to  optimize  or  simplify  digital  logic  circuits  A K-map  is  a 
graphical  display  of  a logic  circuit.  K-map  optimization  is  essentially  the  process  of  finding  a minimum  number  of  maximal 
aggregations  of  K-map  ceils  with  vaiues  of  1 according  to  a set  of  rules.  The  Digi  Isiand  is  a serious  game  designed  for  aiding 
students  to  iearn  K-map  optimization.  The  game  takes  piace  on  an  exotic  isiand  (calied  Digi  Island)  in  the  Pacific  Ocean  The 
piayer  is  an  adventurer  to  the  Digi  Island  and  wiii  transform  it  into  a tourist  attraction  by  developing  real  estates,  such  as  amusement 
parks  and  hotels.  The  Digi  Island  game  elegantly  converts  boring  1s  and  Os  in  digitai  circuits  into  usable  and  unusable  spaces  on  a 
beautiful  island  and  transforms  K-map  optimization  into  real  estate  deveiopment,  an  activity  with  which  many  students  are  familiar 
and  also  interested  in.  This  paper  discusses  the  design,  development,  and  some  preiiminary  results  of  the  Digi  Island  game. 


1.0  INTRODUCTION 

Electronic  games  are  a pervasive  aspect  of 
American  culture  and  entertainment:  as 
many  as  65  percent  of  American 
households  play  games  [1],  With  a revenue 
of  $20.2  billion  in  2009  [2],  the  game 
industry  has  evolved  into  an  important 
sector  that  is  larger  than  the  film  industry. 
The  passion  for  games  can  be  exploited  for 
more  vital  purposes,  such  as  education, 
training,  and  marketing,  via  “serious 
games.”  Game-based  learning  uses 
serious  games  with  defined  learning 
outcomes  and  objectives.  The  values  of 
game-based  learning  have  been  recognized 
by  organizations  such  as  National  Science 
Foundation  and  National  Research  Council 
[3-5].  NSF  considers  games  as  an 
important  form  of  cyberlearning  platform 
and  technology  [6].  The  latest  ground- 
breaking game  technologies,  such  as 
Nintendo  Wi  and  Microsoft  Kinect,  have 
significant  impact  on  gamers,  transforming 
gameplay  into  a more  positive  and  healthy 
experience.  Now  the  gamer  demographics 
cover  every  age  group  [1]. 

Digital  circuits  are  embedded  in  almost  all 
electronic  equipment  and  devices  in  use 
today,  such  as  computers.  MP3  players, 
and  digital  cameras.  Digital  circuit 


optimization,  or  simplification,  is  a process 
to  reduce  the  complexity  of  the  digital 
circuits  so  that  electronic  devices  will  have  a 
smaller  size  (thus  less  weight)  and  less 
power  consumption  (thus  prolonged  battery 
life).  Various  techniques  have  been 
developed  in  the  last  several  decades  for 
digital  circuit  optimization.  Among  them,  the 
Karnaugh  map  is  the  standard  method  to 
teach  digitai  circuit  optimization  in 
introductory  digital  circuit  courses  because 
its  graphical  representations  facilitate  logic 
simplification,  providing  an  intuitive  and 
systematic  way  for  circuit  optimization. 
However,  many  students  have  difficulties 
learning  circuit  optimization  using  Karnaugh 
maps  merely  because  it  is  the  first  time  for 
them  to  be  exposed  to  Karnaugh  maps  and 
class  lectures  do  not  provide  enough 
coverage  and  exercises.  A serious  game 
that  exploits  students’  interest  and  curiosity 
with  games  would  be  helpful  for  learning 
circuit  optimization  using  Karnaugh  maps. 

As  part  of  a Senior  Design  Project  at  the 
Department  of  Electrical  and  Computer 
Engineering  of  Old  Dominion  University,  the 
authors  developed  a serious  game,  Digi 
Island,  to  aid  teaching  and  learning  digital 
circuit  optimization  using  Karnaugh  Maps. 
This  paper  discusses  the  design  and 
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development  of  the  game  and  presents 
some  preliminary  results. 

2.0  BODY 

2.1  Digital  Circuit  Optimization 
Using  K-maps 

There  are  two  different  types  of  electronic 
circuits:  analog  circuits  and  digital  circuits. 
Analog  circuits  represent  and  process 
information  in  continuous  or  analog  form, 
while  in  digital  circuits,  information  is 
represented  and  processed  in  discrete 
(most  commonly  binary)  forms.  Most 
components  of  modern  electronic  devices 
are  digital  circuits  and  the  transformation 
from  analog  to  digital  is  still  underway.  Use 
the  media  for  storing  music  as  an  example. 
The  traditional  audio  cassette  tapes  store 
music  as  analog  signals,  while  the  MP3 
music  players  that  became  extremely 
popular  in  the  last  decade  store  music  as 
digital  signals.  Compared  with  analog 
systems,  digital  systems  have  many 
advantages  in  terms  of  flexibility, 
programmability,  computational  capability, 
numerical  accuracy,  information  storage 
and  retrieval,  error  detection  and  correction, 
and  miniaturization  [7], 

The  same  logic  function  can  be 
implemented  by  different  digital  circuits  with 
varied  complexities.  Thus,  it  is  necessary  to 
find  the  optimal  digital  circuit  with  minimum 
complexity  for  the  desired  function.  Such 
process  is  called  digital  circuit  optimization 
or  simplification,  which  is  important  to 
reduce  the  size  and  weight  of  electronic 
devices  and  prolong  their  battery  life  (just 
think  about  the  evolution  of  cell  phones 
since  their  inception  in  terms  of  size,  weight, 
and  battery  life).  Circuit  optimization  is  an 
important  theoretical  concept  covered  in 
introductory  digital  circuit  courses.  Various 
techniques  have  been  developed  in  the  last 
several  decades  for  digital  circuit 
optimization,  including  Boolean  algebraic 
manipulation  and  minimization,  Karnaugh 
maps,  Quine-McCluskey,  Petrick's 
algorithm.  Espresso,  and  others  [7].  Among 
them,  the  Karnaugh  map  is  the  standard 


method  to  teach  digital  circuit  optimization  in 
introductory  digital  circuit  courses  because  it 
is  a graphical  representation  that  facilitates 
logic  simplification,  providing  a standard  and 
systematic  way  for  circuit  optimization. 

Karnaugh  maps,  also  known  as  K-maps, 
are  graphical  representations  of  logic 
circuits,  which  can  also  be  represented  by 
Boolean  algebraic  expressions.  The  sides 
of  a K-map  represent  circuits  inputs,  while 
each  cell  of  a K-map  represents  the 
corresponding  circuit  output  with  values  of  1 
or  0.  Figure  1 shows  a K-map  representing 
a circuit  with  4 inputs.  K-map  optimization 
is  essentially  the  process  of  finding  a 
minimum  number  of  maximal  aggregations 
of  K-map  cells  with  values  of  1 according  to 
a set  of  rules.  Circuit  simplification  using  K- 
maps  requires  understanding  of  several  key 
concepts,  including  implicant,  prime 
implicant,  and  essential  prime  implicant  [7- 
8].  To  find  the  optimized  expression  of  a K- 
map,  all  prime  implicants  must  be  identified 
first.  The  optimized  expression  is  the  logic 
sum  of  all  essential  prime  implicants  and 
other  prime  implicants  consisting  of 
minterms  not  included  in  the  essential  prime 
implicants.  The  remaining  nonessential 
prime  implicants  can  be  determined  using  a 
selection  rule  that  minimizes  the  overlap 
among  prime  implicants  [8].  K-maps  are 
introduced  in  introductory  digital  logic  circuit 
courses,  such  as  ECE  241  Digital  Logic 
Circuit  at  Old  Dominion  University.  Without 
understanding  and  using  K-maps 
proficiently,  students  are  likely  to  fail  in  this 
introductory  course  and  more  advanced 
digital  circuit  courses. 
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Figure  1 A 4-variable  K-map 
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2.2  Game  Design 

The  goal  of  the  game  Digi  Island  is  to 
provide  both  a formal  introduction  to  K- 
maps  and  an  engaging  game  setting  that 
encapsulates  the  K-maps.  To  this  end,  Digi 
Island  is  designed  as  a construction  based 
strategy  game  [9-10].  The  game  has  three 
modes:  Tutorial  Mode,  Practice  Mode,  and 
Play  Mode.  The  Tutorial  Mode  provides 
several  tutorials  about  the  K-map  through 
exemplary  circuits  with  2,  3,  and  4 inputs;  it 
identifies  implicants,  prime  implicants,  and 
essential  implicants  in  these  circuits, 
illustrates  the  procedure  of  selecting 
essential  prime  implicants  and  nonessential 
Implicants  using  the  selection  rule,  and 
finally  generates  the  Boolean  expression  for 
the  optimized  logic  circuit.  The  Practice 
Mode  first  displays  rectangles  in  K-maps 
with  2,  3,  and  4 input  variables  and  asks  the 
player  to  identify  them  as  implicants,  prime 
implicants,  or  essential  prime  implicants. 

The  player  needs  to  find  all  implicants, 
prime  implicants,  and  essential  prime 
implicants  directly.  User  interfaces  should 
be  provided  to  start  and  end  identification  of 
these  terms.  The  player  also  needs  to 
generate  the  optimized  Boolean  expression. 
User  interfaces  should  be  provided  to  the 
player  to  enter  input  variables,  their 
complements,  and  the  logical  OR  operation. 
Both  the  Tutorial  Mode  and  Practice  Mode 
provide  instructions  and  practices  using 
standard  representations  of  K-maps  that  are 
used  in  regular  classrooms. 

The  Play  Mode  is  the  fun  part  of  the  Digi 
Island  game.  There  are  no  more  K-maps  in 
the  Play  Mode  and  instead  the  player  sees 
a beautiful  island  (called  Digi  Island)  in  the 
Pacific  Ocean.  In  the  game,  the  player  is  an 
adventurer  to  the  Digi  Island  and  will 
transform  it  into  a tourist  attraction  by 
developing  real  estates,  such  as 
amusement  parks  and  hotels.  However,  not 
all  places  can  be  exploited  by  the  player, 
such  as  rocks  and  reserves  for  wild  lives. 
The  player  is  given  a map  of  Digi  Island  that 
labels  all  usable  and  unusable  spaces. 
Large  buildings  bring  about  better  financial 
outcomes  as  they  provide  more  efficient 


utilization  of  the  space.  Thus,  the  goal  of 
the  player  is  to  construct  a minimum 
number  of  building  as  large  as  possible 
covering  all  the  usable  spaces,  while 
satisfying  the  regulations  of  the  Pac 
Republic,  which  exercises  sovereignty  over 
the  Digi  Island.  Some  sample  regulations 
are  listed  below. 

• All  the  spaces  occupied  by  each 
building  must  be  adjacent  to  each  other. 

• The  number  of  usable  spaces  (blocks) 
in  each  building  must  be  a power  of  2. 

• Sharing  spaces  between  adjacent 

buildings  is  allowed,  but  should  be 
minimized. 

Initially,  the  player  has  a certain  amount  of 
cash  that  can  be  used  to  exploit  the  island 
and  construct  buildings  on  the  usable 
spaces.  Depending  on  the  importance  of 
the  buildings,  they  have  different  values  and 
are  represented  differently,  e.g.,  single 
house,  skyscrapers.  Larger  buildings  are 
more  valuable  and  generate  more  profit  for 
the  player  and  thus  more  points. 

In  addition  to  the  game  design  discussed 
above,  the  game  has  a number  of  other 
requirements.  The  game  must  be  deployed 
and  playable  on  personal  computers  and 
Microsoft  Zune  HD  media  players.  The 
touch  screen  of  the  Zune  HDs  must  be 
utilized  as  input  device.  Sound  effects  must 
be  Included  to  provide  user  feedback. 

Voice  instructions  should  be  provided  as 
well. 

2.3  Game  Development 

The  development  of  the  game  Digi  Island 
contained  two  major  components:  front  end 
and  back  end.  The  front  end  mainly 
contains  a graphical  user  interface  that 
displays  menus,  tutorials,  and  K-maps.  The 
front  end  has  different  user  input  modes  for 
K-map  manipulation  and  provides  user 
feedback  for  their  actions.  The  back  end 
contains  the  major  logic  for  digital  circuit 
optimization,  that  is,  for  an  input  digital 
circuit,  the  back-end  generates  the 
optimized  Boolean  expression,  compares 
that  with  the  player's  solution  (answer),  and 
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provides  feedback  to  the  player.  In  the 
following,  the  back  end  is  discussed  first. 

2.3.1  Logic  Circuit  Optimization 

Since  the  goal  of  the  game  is  to  help  the 
player  learn  logic  circuit  optimization  using 
K-maps,  the  game  must  know  the  final 
optimized  Boolean  expression  for  an  input 
logic  circuit.  Although  K-maps  are  effective 
visual  tools  used  by  humans  for  manual 
circuit  optimization,  they  are  not  suitable  to 
be  implemented  on  computers  for  automatic 
circuit  optimization.  Several  other  logic 
circuit  optimization  algorithms  have  been 
developed  and  here  we  mainly  discuss  two 
widely  used  methods;  Espresso  algorithm 
and  Quine-McCluskey  algorithm. 

The  Espresso  algorithm  is  the  de  facto 
industry  standard  for  circuit  optimization  and 
it  was  initially  created  by  Brayton  et  al.  [1 1] 
and  later  revised  by  Rudell  of  University  of 
California  at  Berkeley  [12].  The  Espresso 
algorithm  is  a heuristic  but  effective 
algorithm  in  terms  of  memory  usage  and 
computational  complexity.  Although  it  does 
not  guarantee  to  produce  an  optimized 
circuit,  in  practice  it  always  leads  to  a 
solution  that  is  either  optimal  or  very  close 
to  optimal.  The  source  code  of  the 
Espresso  algorithm  in  C programming 
language  is  available  for  downloading  from 
the  University  of  California  at  Berkeley  [13]. 
There  were  several  options  to  utilize  the 
Espresso  algorithm  in  the  game:  1)  porting 
the  Espresso  C source  code  to  C#  in  the 
game,  2)  compiling  the  source  code  into 
libraries  and  calling  the  libraries  in  the 
game,  and  3)  calling  the  executables  of 
Espresso  directly.  Considering  the  time 
restriction  (only  one  semester)  of  this  senior 
design  project  and  that  the  Espresso  is  a 
complex  algorithm,  it  was  not  feasible  to 
port  the  Espresso  source  code  in  C to  C#  in 
this  project.  Options  2 and  3 worked  for  the 
game  running  on  the  personal  computers 
(PCs),  but  didn't  for  the  game  on  the  Zune 
HD  player  since  there  is  no  C compiler  that 
can  generate  object  code  for  the  Zune  HD 
platform.  Considering  these  obstacles,  the 


Espresso  algorithm  was  not  deemed 
suitable  for  this  project. 

The  Quine-McCluskey  algorithm  is  another 
widely  used  method  for  circuit  optimization 
[14-15].  The  Quine-McCluskey  algorithm  is 
designed  to  work  similarly  to  the  human 
brain's  pattern  recognition.  It  is  a 
systematic  method  that  guarantees  to 
produce  the  optimized  Boolean  expression 
and  its  tabular  form  makes  it  suitable  for 
computer  implementation.  The  Quine- 
McCluskey  algorithm  first  finds  all  implicants 
with  n variables,  then  combines  some  of 
them  to  implicants  with  n - 1 variables,  and 
continue  this  combination  process  until  all 
prime  implicants  are  found.  The  algorithm 
then  identify  all  essential  prime  implicants 
using  a prime  implicant  chart.  However,  the 
circuit  is  not  fully  optimized  or  minimized  yet 
as  the  remaining  prime  implicants  may  still 
have  overlap.  A covering  procedure  is  then 
utilized  to  select  a minimum  number  of 
remaining  non-essential  prime  implicants  in 
the  prime  implicant  chart  so  that  the  circuit 
function  is  fully  covered.  Unlike  the 
Espresso  algorithm.  No  authoritative  source 
code  exists  for  the  Quine-McCluskey 
algorithm.  Since  it  is  a systematic  method 
that  is  straightforward  to  implement,  the 
team  decided  to  develop  the  C#  code  for 
the  Quine-McCluskey  algorithm  from 
scratch.  The  Microsoft  .NET  Framework 
and  C#  programming  language  were  utilized 
to  develop  the  code.  C#  is  an  object- 
oriented  programming  language  drafted  by 
Microsoft  and  has  been  approved  as  a 
standard  by  ISO. 

2.3.2  Game  Play 

Microsoft  XNA  Game  Studio  is  a game 
development  toolkit  for  Windows,  Xbox  360, 
Zune  HD  players,  and  Windows  phones. 

The  XNA  Game  Studio  consists  of  two 
parts:  XNA  Framework  and  a set  of  tools 
and  templates  for  game  development.  The 
XNA  Framework  is  an  extensive  set  of 
libraries  for  game  development  based  on 
the  Microsoft  .NET  Framework.  It 
encapsulates  low-level  technical  details  so 
that  game  developers  can  focus  more  on 
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content  and  high-level  development  [16]. 
XNA  provides  templates  for  common  tasks, 
such  as  development  of  games,  game 
libraries,  and  game  components.  It  also 
provides  utilities  for  cross-platform 
development,  publishing,  and  deployment. 
Developers  can  make  use  of  both  the  XNA 
Framework  and  the  NET  Framework  in  the 
game  with  the  former  for  game-specific 
tasks  such  as  graphics  rendering  and 
managing  inputs  and  latter  for  more  general 
programming  tasks. 

XNA  Game  Studio  is  a powerful  tool  for 
rapid  development  of  cross-platforms  and  it 
was  selected  as  the  tool  to  implement  the 
graphical  user  interface  and  game  play. 

XNA  Game  Studio  3.1  was  used  and  the 
latest  version  is  4.0  Beta.  Object  oriented 
programming  (OOP)  was  utilized  and  a 
number  of  classes  were  developed  to 
represent  different  game  scenes,  graphical 
user  interface,  user  inputs,  K-map,  and  logic 
circuit  optimizations.  XNA  Game  Studio 
provides  a fundamental  class  Game  that 
handles  game  logic  update  and  drawing.  A 
class  diagram  of  the  game  is  shown  in 
Figure  2. 


Figure  2 A class  diagram  of  the  game. 

2.4  Preliminary  Results 

A preliminary  prototype  has  been  developed 
as  a product  of  this  Senior  Design  Project  in 


the  Spring  Semester  of  2010.  Some  screen 
captures  of  the  game  are  shown  in  Figure  3. 


(e) 


Figure  3 Screen  captures  of  Digi  Island,  (a) 
Welcome  screen,  (b)  Tutorial,  (c)  Practice 
mode  with  direct  implicant  selection,  (d) 
Practice  mode  with  equation  input,  (e)  Play 
mode. 


The  game  has  been  deployed  on  the  Zune 
HD  media  player.  Figure  3(a);  it  has  a 
multimedia  tutorial.  Figure  3(b);  the  player 
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can  select  different  input  mode  (direct 
implicant  selection  and  equation  input)  to 
enter  the  answer,  Figure  3{c);  the  Digi 
Island  map  is  shown  in  Figure  3{d). 

3.0  DISCUSSION 

The  limited  prototype  of  the  game  already 
demonstrated  many  advantages  of  the 
game.  For  example,  the  typical  K-maps  in 
the  textbooks  contain  many  overlapping 
rectangles  representing  different  primary 
implicants,  making  them  visually  confusing 
and  difficult  to  understand.  In  the  Digi 
Island  game,  each  primary  implicant  can  be 
selected  and  highlighted  individually  through 
user  interactions,  leading  to  a much  clearer 
representation  and  better  understandings. 
Advanced  rendering  techniques,  such  as 
transparency  control  using  alpha  maps,  can 
be  used  to  align  two  K-maps  to  facilitate 
groups  of  minterms,  which  is  not  possible 
using  just  plain  textbooks  or  paper  and 
pencil  methods. 

One  Important  principle  of  learning  is  to 
connect  new  concepts  and  understanding  to 
pre-existing  knowledge  [17].  The  Digi 
Island  game  elegantly  converts  boring  Is 
and  Os  in  digital  circuits  into  usable  and 
unusable  spaces  on  a beautiful  island  and 
transforms  K-map  optimization  into  real 
estate  development,  an  activity  with  which 
many  students  are  familiar  and  also 
interested  in.  The  rules  for  K-map 
optimization  exhibit  themselves  as 
construction  regulations  for  real  estate 
development.  Players  will  be  more 
engaged  when  they  deal  with  real  assets 
such  as  skyscrapers  and  amusement  parks 
rather  than  blocks  of  abstract  Is  and  Os. 

Currently,  the  K-maps  are  randomly 
generated  by  the  game  and  the  player  has 
no  control  of  the  K-map  generation.  In  the 
near  future,  we  will  add  another  mode  in 
which  the  player  can  generate  his/her  own 
K-maps.  However,  this  may  have  some 
unintended  effects,  e.g.,  the  player  can  use 
this  game  to  solve  homework  problems. 

An  area  of  future  development  is  to  include 
"don't  cares"  in  the  K-maps,  which  are  the 


outputs  of  certain  inputs  that  do  not  matter. 
They  "don't  cares"  can  be  treated  as  either 
1 or  0.  Usually  including  the  "don't  cares"  in 
the  K-map  optimization  can  further  simplify 
the  circuit. 

The  game  can  be  further  expanded  to  have 
multiplayer  mode  to  form  more  competitive 
and  engaging  game  plays.  Players  can 
face  off  against  each  other  to  see  who  can 
solve  a K-map  the  fastest.  With  networking, 
a game  server  can  also  be  setup  to  store 
player  configuration  and  performance.  The 
game  can  be  easily  ported  to  smartphones 
using  Wndows  Phone  7 with  very  minimal 
effort. 

4.0  CONCLUSION 

K-maps  are  an  important  tool  for  teaching 
and  learning  digital  circuit  optimization  and 
simplification,  which  is  critical  to  reduce 
physical  size  and  power  consumption  of 
electronic  devices  and  prolong  their  battery 
life.  This  paper  discussed  the  design  and 
development  of  Digi  Island,  a serious  game 
for  teaching  and  learning  K-map 
optimization.  The  game  was  developed  as 
a product  of  a senior  design  project  at  Old 
Dominion  University.  The  Digi  Island  is  a 
fun  and  engaging  game  that  offers  many 
advantages  over  traditional  teaching 
methods  of  K-maps  and  it  will  be  further 
expanded  and  enhanced  in  the  near  future. 
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• Introduction 

• Digital  Circuit  Optimization  Using  K-maps 

• Game  Design 
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- Logic  Circuit  Optimization 

- Game  Play 

• Results 

• Discussion  and  Conclusion 
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Digital  Circuits  and 
Optimization 
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• Digital  circuits  are  present  in  almost  all 
electronics  equipment  and  devices  today 

- Computers,  MP3  players,  digital  cameras,  etc. 

• Optimization  of  digital  circuits  reduces 
complexity. 

- Less  weight,  longer  battery  life 
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Necessity  of  Optimization 

• Same  logic  function  can  have  multiple  solutions 
of  varying  complexities. 

• Necessary  to  find  the  solution  that  has  the  least 
complexity. 

• Reduction  in  complexity  allows  for  reduction  in 
weight  and  size  of  circuits,  and  increase  in 
battery  life. 

• Process  to  find  the  least  complex  circuit  is 
known  as  digital  circuit  optimization. 
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■“ ““  Methods  of  Optimization 

• Various  methods  of  optimization  have  been 
developed  over  the  years. 

- Boolean  algebraic  manipulation  and  minimization 

- Karnaugh  maps 

- Quine-McCluskey  algorithm 

- Espresso  algorithm 

- Petrick’s  algorithm 

• Karnaugh  Map  is  the  standard  method  for 
teaching  digital  circuit  optimization. 
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Karnaugh  Map  Overview 

Karnaugh  maps  (K-maps)  graphically  represent  logic  circuits 
intuitively  and  systematically. 

Sides  of  a K-map  represent  circuit  inputs. 

Each  cell  of  a K-map  represents  circuit  output  with  values  of  1 
or  0. 
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K-map  Optimization  (1/2) 

• K-map  optimization  is  the  process  of  finding  a 
minimal  number  of  maximal  aggregations  of  K- 
map  cells. 

- Cells  must  have  a value  of  1 . 

- Grouped  according  to  a set  of  rules 

• Key  concepts  to  understanding  K-maps 

- Implicant 

- Prime  Implicant 

- Essential  Prime  Implicant 
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7 

K-map  Optimization  (2/2) 


• To  find  the  optimized  expression,  identify  all 
prime  implicants. 

• Optimized  expression  is  the  sum  of  essential 
prime  implicants  and  prime  implicants  not 
contained  in  essential  prime  implicants. 

- Remaining  prime  implicants  determined  by  selection 
rules 

- Selection  rules  minimize  implicant  overlap. 
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Digi  Island  Overview 


• Passion  for  games  can  be  exploited  for 
educational  purposes  via  “serious  games.” 

• Goal  of  the  game  is  to  formally  introduce  K- 
maps  in  an  engaging  manner. 

• Designed  as  a construction  based  strategy 
game. 

• Digi  Island  has  three  modes 

- Tutorial  Mode 

- Practice  Mode 

- Play  Mode 
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Game  Requirements 

• Must  be  playable  on  personal  computers  and 
the  Microsoft  Zune  HD. 

• On  the  Zune,  the  touch  screen  is  the  input 
device. 

• Sound  effects  and  voice  instructions  must  be 
included. 
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• Provides  several  tutorials  by  providing  circuits 
with  2,  3 or  4 variables. 

• Identifies  implicants,  prime  implicants  and 
essential  implicants. 

• Illustrates  procedure  for  selecting  prime 
implicants  and  nonessential  implicants. 

• Generates  final  Boolean  expression  for  the 
circuit. 
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Practice  Mode 
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• Displays  K-maps  with  2,  3 or  4 variables. 

• Asks  the  player  to  identify  implicants,  prime 
implicants  and  essential  prime  implicants. 

• Player  needs  to  find  all  implicants  directly. 

• Player  must  also  generate  the  optimized 
Boolean  expression. 


9/13/2010 


12 


881 


MODSIM  WORLD 

Conference  ^ E*po 


Play  Mode 


• Play  mode  is  where  Digi  Island  evolves  from  a teaching  tool  to 
a fun,  engaging  game. 

• Player  is  an  adventurer  to  Digi  Island,  and  challenged  to 
develop  the  island  into  a tourist  attraction  by  developing  real 
estates. 

• Player  cannot  utilize  the  entire  island. 

- Rocky  terrain 

- Wildlife  preserves 

- Regulations  from  the  iocal  government 

• Player  must  maximize  available  space  with  buildings  to 
produce  the  greatest  income. 
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Game  Components 
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• Game  development  contained  front  end  and 
back  end  components. 

• Front  end  component  deals  with  graphical  user 
interface. 

- Menus,  tutorials,  K-maps 

- Different  user  input  modes  and  user  feedback 

• Back  end  component  contains  major  logic  for 
digital  circuit  optimization. 

- Generates  optimized  Boolean  expression(s) 

- Compares  possible  solutions  to  player’s  answer 
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Logic  Circuit  Optimization 

• To  effectively  teach  students  about  K-maps,  the 
game  must  know  the  final  optimized  expression. 

• K-maps  work  well  visually,  but  do  not  lend  well  to 
implementation  on  a computer. 

• Several  algorithms  have  been  developed  for 
logic  circuit  optimization. 

- Quine-McCluskey  algorithm 

- Espresso  algorithm 
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Espresso  Algorithm 
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• Heuristic  algorithm  that  is  highly  efficient 

• Doesn’t  guarantee  optimized  circuit,  but  close. 

• Available  from  University  of  California  at 
Berkeley. 

• Not  feasible,  as  implementation  options  either 
too  time  consuming  or  incompatible  with  Zune 
HD. 


9/13/2010 


16 


883 


' ijMODSlM  WORLD 

, 0.,. Quine-McCluskey  Algorithm 

• Uses  pattern  recognition  similar  to  human  brain. 

• Guarantees  optimal  solution,  but  inefficient. 

• Team  developed  code  for  the  algorithm  from 
scratch. 

- Microsoft  .NET  Framework 

- C#  programming  language 
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Programming  Tools 
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• Game  play  developed  using  Microsoft’s  XNAGame  Studio, 
which  supports  cross  platform  development  and  includes 

- Tools  and  templates  for  rapid  game  development 

- optimized  libraries  based  on  Microsoft  .NET  Framework 

• XNA  Game  Studio  provides  “Game”  class  to  handle  game 
logic  updates  and  drawing. 

• Utilized  object  oriented  programming  to  develop  a number  of 
classes. 
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Class  Diagram 
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Preliminary  Results 


• Prototype  deployed  on  the  PC  and  Zune  HD. 

• Tutorial  and  practice  modes  completed,  with 
minor  aesthetic  changes  possible. 

• Quine-McCluskey  algorithm  developed  from 
scratch. 

• Play  mode  yet  to  be  implemented. 
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Zune  Implementation 
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Tutorial  Mode 
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Practice  Mode 
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Advantages  of  Digi  Island 

• Textbook  examples  of  K-maps  are  visually 
confusing,  which  renders  them  difficult  to 
understand. 

• Digi  Island  allows  users  to  highlight  specific 
answers,  leading  to  clearer  understanding. 

• Digi  Island  converts  K-maps  into  an  island  that 
players  transform  to  earn  points,  which  engages 
learning. 
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Future  Developments 


• Include  “Don’t  Cares”,  or  outputs  that  do  not 
matter  and  can  be  either  0 or  1 . 

• Multiplayer  mode 

• Port  to  smartphones  with  Windows  Phone  7. 
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Conclusion 

• K-maps  are  important  for  teaching  digital  circuit 
optimization. 

• Digital  circuit  optimization  is  necessary  to  reduce 
size  and  increase  battery  life  of  circuits. 

• Digi  Island  is  a fun,  engaging  game  that  offers 
many  advantages  over  traditional  teaching. 

• Digi  Island  will  be  further  enhanced  and 
expanded. 
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Thank  you! 
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6.5  Longitudinal  Study:  Efficacy  of  Online  Technology  Tools  for 
Instructional  Use 


Longitudinal  Study:  Efficacy  of  Online  Technology  Tools  for 

Instructional  Use 

Michael  D.  Uenking 
Thomas  Nelson  Community  College 
uenkinam(8)tncc.  edu 

Studies  show  that  the  student  population  (secondary  and  post  secondary)  is  becoming  increasingly  more  technologically  savvy. 
Use  of  the  Internet,  computers.  MP3  players,  and  other  technologies  along  with  online  gaming  has  increased  tremendously  amongst 
this  population  such  that  it  is  creating  an  apparent  paradigm  shift  in  the  learning  modalities  of  these  students.  Instructors  and 
facilitators  of  learning  can  no  longer  rely  solely  on  traditional  lecture-based  lesson  formats.  In  order  to  achieve  student  academic 
success  and  satisfaction  and  to  increase  student  retention,  instructors  must  embrace  various  technology  tools  that  are  available  and 
employ  them  In  their  lessons.  A longitudinal  study  (January  2009-June  2010)  has  been  performed  that  encompasses  the  use  of 
several  technology  tools  in  an  instructional  setting.  The  study  provides  further  evidence  that  students  not  only  like  the  tools  that  are 
being  used,  but  prefer  that  these  tools  be  used  to  help  supplement  and  enhance  Instruction. 


1.0  INTRODUCTION 

Technology  is  becoming  more  prevalent  in 
our  society  and  our  regular  day-to-day 
activities.  With  online  chat,  video,  and  other 
tools,  we  are  bridging  the  gap  between 
peoples  of  different  cultures,  ethnic 
backgrounds,  and  languages  without  having 
to  spend  thousands  of  dollars  to  travel  to 
foreign  countries.  With  tight  budget 
constraints,  businesses  are  choosing  to 
conduct  meetings  using  these  and  other 
tools  instead  of  sending  their  employees 
overseas  to  have  face-to-face  conferences. 

Even  children  of  today  are  able  to  fight  a 
“virtual”  war  or  play  “virtual”  sports  using  an 
online  gaming  system  (i.e.  XBox, 

Playstation,  Wii,  etc.)  and  team  up  with 
people  all  across  the  world  to  accomplish 
their  various  tasks  and  missions.  But  as 
time  progresses,  as  these  children  are 
growing  up  in  their  respective  countries 
using  these  and  other  computer 
technologies,  they  are  becoming 
increasingly  more  technologically  savvy. 

But  is  this  possibly  creating  a paradigm  shift 
in  the  way  these  children  are  learning?  Are 
educators  encountering  difficulties  teaching 
children  of  the  21®'  century  especially  if  they 
do  not  use  technology  in  their  classrooms? 
Could  this  possibly  be  contributing  to  some 
of  the  behavioral  problems  that  educators 
are  facing?  The  answers  to  all  of  these 


questions  are  not  entirely  clear.  But  what  is 
clear  is  that  students  enjoy  having 
technology  as  part  of  the  learning 
experience  and  educators  also  find  that 
technology  provides  them  with  a rewarding 
experience  as  well.  In  the  very  least, 
according  to  Reference  [17],  learning  in  an 
online  environment  has  been 
overwhelmingly  proved  to  be  just  as 
effective  as  that  in  traditional  classrooms 
(Tallent-Runnels  et  al.,  2006,  Spring,  p. 

116). 

The  following  report  encapsulates  a 
longitudinal  study  that  occurred  from 
January  2009  to  June  2010  in  which  an 
online  tool  (Adobe®  Captivate®)  was  used 
to  conduct  a mechanical  engineering 
technology  lesson.  Quantitative  data  was 
collected  from  the  students  and  qualitative 
data  was  collected  from  fellow  instructors 
during  this  timeframe.  The  next  section  will 
provide  the  body  of  this  report. 

2.0  BODY 

The  first  main  section  is  a literature  review 
that  imparts  the  background  for  this  study. 
This  section  will  be  broken  down  into  the 
following  sub-sections:  effects  of  online 
gaming,  characteristics  of  an  online  student, 
advantages  of  online  learning,  and  issues 
that  exist  with  online  learning  tools.  The 
second  section  will  discuss  the  method 
used  in  the  study.  The  third  section  will 
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provide  a brief  description  of  the  participants 
used  during  the  study.  The  fourth  and  fifth 
sections  will  introduce  the  reader  to  the 
quantitative  analysis  and  results, 
respectively.  Finally,  the  sixth  section  will 
provide  some  qualitative  comments 
provided  by  the  instructors  who  were  given 
a chance  to  review  the  modules. 

2.1  Literature  Review 

2.1.1  Effects  of  Online  Gaming 

As  mentioned  previously,  there  are  children 
across  the  world  who  are  engaged  in 
various  forms  of  online  gaming.  They  use 
various  forms  of  gaming  devices  to  include 
their  personal  computer  and/or  some  or 
other  commercially  available  gaming 
console  such  as  XBox  Live,  Playstation.  Wii, 
or  others.  Whenever  a child  engages  in 
these  forms  of  online  gaming  systems,  it  is 
obvious  that  learning  is  also  occurring.  Not 
only  do  the  children  have  to  learn  how  to 
use  the  system,  but  also  embedded  within 
the  individual  games  are  certain  techniques, 
skills,  and  strategies  that  must  also  be 
learned  in  order  for  the  student  to  become 
proficient  in  the  game  and  be  more 
competitive  with  and  against  other  players 
who  are  in  the  system.  So.  if  learning  is 
truly  occurring  in  the  gaming  world,  then 
how  is  that  being  translated  to  the  real 
world;  more  specifically,  how  is  it  being 
translated  in  the  educational  environment  of 
these  students? 

Reference  [15]  provides  a detailed  study 
that  addresses  this  very  question.  This 
study  provided  the  following  results,  in  that 
online  gaming: 

• provides  learners  multiple  avenues 
of  support  and  communication; 

• provides  learners  opportunities  to 
access  vital  information  via  social 
networks  and  construct  knowledge 
as  the  result  of  social  collaboration; 

• promotes  deliberate,  functional 
epistemology  toward  the  acquisition 
of  knowledge  and  the  development 
of  performance; 


• affords  various  degrees  and  types  of 
interactivity,  each  supporting  the 
development  of  expertise  in  unique 
and  interesting  ways; 

• and  provides  a structured  context 
intended  to  promote  the  necessary 
skills  to  accomplish  complex,  goal- 
based  tasks  (Schrader  & McCreery, 
2008,  December,  pp.  570-571). 

The  study  also  states  the  individual  learners 
“are  empowered  through  a dynamic, 
interconnected  process  that  scaffolds  both 
technological  skills  sets  and  content 
knowledge  [which,  in  turn,]  provide 
substantial  support  and  developmental  tools 
for  focused  goal  oriented  learning  at  all 
levels  of  expertise”  (p.  571).  So,  it  is  easy 
to  see  how  students  of  today  are  using 
technology  to  not  only  provide  cognitive 
engagement,  but  how  that  same  technology 
also  enhances  their  higher  order  thinking 
skills.  What  characteristics  are  then 
commonplace  in  these  types  of  students 
who  are  now  becoming  online  learners  and 
are  engaging  in  online  learning 
environments? 

2.1.2  Characteristics  of  an  online 
student 

References  [2],  [3],  [11],  and  [17]  all  agree 
that  a successful  online  learner  is  one  that 
is  already  proficient  in  the  basic  use  of  a 
computer  and  has  either  prior  online 
learning  experience  or  is  fluent/proficient 
with  using  the  Internet  and  various  online 
tools  (Cramer,  Cramer,  Fisher,  & Fink, 

2008,  p.  35,  December;  Dabbagh  and 
Ban  nan- Ri  Hand,  2005,  p.  39;  Menchaca  & 
Bekele,  2008,  pp.  246-249;  Tallent-Runnels 
et  al.,  2006,  Spring,  p.  116).  But  reference 
[3]  provides  an  even  more  detailed 
description  of  the  ideal  online  learner: 

• Exhibiting  a need  for  affiliation 

• Understanding  and  valuing 
interaction  and  collaborative  learning 

• Possessing  an  internal  locus  of 
control 

• Having  a strong  academic  self- 
concept 
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• Having  experience  in  self-directed 
learning  or  the  initiative  to  acquire 
such  skills  (Dabbagh  and  Bannan- 
Ritland,  2005,  p.  39) 

With  these  skills  being  applied  in  the  online 
environment,  there  are  definitely  some 
advantages  of  learning  online. 

2.1.3  Advantages  of  Learning  Online 

There  are  many  advantages  to  learning 
online  which  are  as  follows: 

• References  [4],  [8],  [9],  [10],  [11], 
[13],  [14],  and  [19]  all  show  that 
online  learning  contributes  to  not 
only  higher  achievement  rates,  but 
also  to  higher  satisfaction  levels, 
and  higher  levels  of  engagement 
(D’Arcy,  Eastburn,  & Bruce,  2009, 
Winter,  p.  62;  Jackson  et  al.,  2006, 
May,  p,  433;  Krentler  & Wllis-Flurry, 
2005,  July/August,  p.  319;  Lim,  Kim, 
Chen,  & Ryder,  2008,  June,  p.  119; 
Menchaca  & Bekele,  2008,  pp.  246- 
249;  Rogers  and  Cox,  2008, 
January/February,  p.  38;  Saade  & 
Kira,  2004,  Wnter,  p.  362;  Wang  & 
Reeves,  2007,  p.  190). 

• Reference  [2]  states  that  students 
may  feel  “more  connected,  more 
challenged,  and  more  engaged  in 
learning  than  ever  before... self- 
confidence  [can  also  be  developed 
as  well]”  (Cramer,  Cramer,  Fisher,  & 
Fink,  2008,  December,  p.  35). 

• From  a more  learning  theory-based 
approach,  online  learning  also, 
according  to  Reference  [7],  helps  to 
support  the  “constructivist  learning” 
modality  “which  encourage,  and  are 
focused  on,  users  creating,  or 
constructing,  their  own  content” 

(Hsu,  2007,  p.  71).  These  tools  also 
“emphasize  student  interaction, 
group  learning,  and  collaboration, 
rather  than  the  more  traditional 
classroom  mode... [especially] 
where  the  emphasis  is  on  student 
communication,  where  students 
have  access  to  technology,  and 
where  creative  output  and  thinking 


is  encouraged”  (p.  85).  Reference 
[5]  also  points  out  the  need  for  this 
"constructivist  learning"  environment 
to  be  more  “learner- centered”  as 
well  (Hannum,  Irvin,  Lei,  & Farmer, 
2008,  November,  p.  223). 

• References  [6],  [11],  and  [20]  also 
address  the  fact  that  learning  online 
provides  more  flexibility  of  where 
and  when  the  learning  will  occur 
(i.e.  home,  work,  vacation,  or  on 
travel  for  business)  and  more 
specifically,  for  rural  areas  where  a 
traditional  instructor  is  hard  to 
acquire  (Hannum,  Irvin,  Banks,  & 
Farmer,  2009,  pp.  13-14;  Menchaca 
& Bekele,  2008,  pp.  246-249;  Zhao, 
Alexander,  Perreault,  Waldman,  & 
Truell,  2009,  pp.  210-211). 

• References  [1],  [4],  and  [14] 
address  the  efficacy  of  using  online 
quizzes  in  that  not  only  do  they 
provide  repetition,  but  also  instant 
feedback  to  the  students  and,  in 
turn,  they  better  prepare  the 
students  for  unit  exams.  Faculty 
and  students  are  also  able  to  focus 
on  discussion  and  hands-on 
activities  (Bartini,  2008,  p,  10; 
D’Arcy,  Eastburn,  & Bruce,  2009, 
Wnter,  p.  57;  Saade  & Kira,  2004, 
Wnter,  p.  361). 

2.1.4  Issues  with  Learning  Online 

References  [11],  [12],  and  [17)  promote  the 
notion  that  it  is  important  for  students  to 
have  some  sense  of  community  whether  it 
is  a face-to-face  contact  session  or  some 
means  to  make  connections  with  the  faculty 
and  their  peers.  This,  in  turn,  helps  to 
enhance  the  learning  process  (Menchaca  & 
Bekele,  2008,  pp.  246-249;  Nicholas  & Ng, 
2009,  p.  323;  Tallent-Runnels  et  al.,  2006, 
Spring,  p.  1 16).  The  main  issue  that  many 
students  have  is  in  how  the  online  course  is 
formatted  and  designed;  so  it  is  important, 
according  to  References  [10],  [11],  [16], 
and  [19],  that  instructors  provide  means  for 
practice,  feedback,  and  improvement  for 
the  course,  that  technical  issues  are  directly 
addressed,  and  that  they  ensure  that  the 
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online  tools  that  are  being  used  are 
updated  and  current  (Urn,  Kim,  Chen,  & 
Ryder,  2008,  June,  p.  119;  Menchaca  & 
Bekele,  2008,  p.  249;  Sitzmann,  Kraiger, 
Stewart,  & Wisher,  2006,  Autumn,  p.  654; 
Wang  & Reeves,  2007,  pp.  186-190). 

One  additional  issue  that  Reference  [18] 
mentions  is  that  there  exist  “differences  in 
perception  about  online  learning... between 
faculty  students... which  may  be  due  to  the 
heterogeneous  points  of  view  and 
motivations  for  online  learning  between 
faculty  and  students”  (Tanner,  Noser,  & 
Totaro,  2009,  p.  36).  The  next  section  will 
now  go  into  detail  about  the  study  that  was 
performed. 

2.2  Method 

In  the  Spring  2009  semester,  a three-part 
mechanical  engineering  technology  online 
module  was  developed  that  addressed  the 
three  basic  methods  of  truss  analysis  (i.e. 
method  of  joints,  method  of  sections,  and 
method  of  members).  The  students  were 
given  the  module  in  lieu  of  regular  class 
meetings  over  a ten-day  period.  They  were 
not  allowed  to  obtain  any  assistance  from 
the  instructor  during  this  timeframe  nor  were 
they  allowed  to  elicit  support  from  their 
classmates.  The  only  tools  that  they  were 
allowed  to  use  during  this  timeframe 
included  their  textbook,  a calculator,  writing 
utensils,  the  module,  and  their  respective 
computers  with  Internet  access.  All  student 
participants  were  given  a pre-test  that  was 
comprised  of  five  truss  analysis  problems 
and  were  instructed  to  not  prepare  for  it  prior 
to  the  exam.  This  pre-test  provided  a 
baseline  assessment  score  that  was 
compared  to  a final  assessment  score  in  the 
final  analysis  component.  These  two 
assessment  scores,  Likert  Scale  values,  and 
demographical  data  were  the  primary  forms 
of  quantitative  data.  A second  session  was 
attempted  in  the  Fall  2009  semester,  but 
various  college  Internet  technical  issues 
prevented  the  students  from  completing  the 
module,  so  their  data  was  removed  from 
consideration  for  the  longitudinal  study.  But 
the  module  was  also  presented  to  other 


instructors  via  email  transfer  during  the 
Spring  2010  semester  and  was  also 
presented  at  the  2010  Virginia  Community 
College  System  (VCCS)  New  Horizons 
conference.  Qualitative  data  was  collected 
from  all  of  the  instructors. 

The  software  program  that  was  used  to 
develop  the  module  was  Adobe®  Captivate® 
which  enables  the  instructor  to  incorporate 
animation  (text  and  graphic),  PowerPoint 
slides,  user-input  text  fields,  instant  feedback 
quiz  generation  (which  can  also  send  the 
results  to  the  user’s  email  address),  music, 
and  recorded  voice.  The  program  also 
allows  the  instructor  to  create  multiple 
formats  that  can  be  incorporated  into  various 
media  outlets  (i.e.  Flash  video,  HTML,  and  a 
standalone  executable).  A snapshot  of  the 
user  interface  is  shown  below  in  Figure  1; 


Figure  1.  Snapshot  of  Adobe  Captivate  User 
Interface 


2.3  Participants 

A total  of  ten  students  (comprised  of  three 
females  and  seven  males  whose  average 
age  was  approximately  25)  that  were 
enrolled  in  MEC131  (Applied  Statics  in 
Engineering  Technology)  participated  in  the 
study.  There  were  also  five  instructors  from 
different  colleges  across  the  United  States 
that  provided  qualitative  feedback  via  email. 
The  last  group  was  comprised  of  six 
additional  instructors  in  the  VCCS  who  were 
given  a demonstration  of  the  module,  were 
provided  results  from  the  Fall  2009  student 
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data,  and  were  given  an  opportunity  to 
interact  with  the  module.  Qualitative  data 
was  collected  for  this  last  group  as  well. 

2.4  Student  Quantitative  Analysis 

The  student  analysis  is  provided  below: 

Hypothesis:  The  average  score  of  the  post- 
test will  be  higher  than  the  average  score  of 
the  pre-test. 

Null  hypothesis:  There  is  no  difference  in 
the  average  scores  of  the  pre-test  and  post- 
test. 

Test  used:  one-tailed  t-test 

Value  for  alpha:  p = .05 

Table  1 below  provides  the  data  that  was 
collected  for  the  pre-test  and  post- test  data 
values: 


Table  1.  Student  Pre-Test  and  Post-Test  Data 
Values 


Count: 

10 

10 

Averages: 

32.50 

55.00 

Median: 

40.00 

58.75 

Std  Deviation: 

25.658007 

30.29943 

Variance: 

658.33333 

918.0556 

Var/(number-1): 

73.148148 

102.0062 

Total: 

175,15432 

Square  Root; 

1 3.234588 

estimated  t- 
value: 

1 .7000907 

Degrees  of 
Freedom: 

18 

2.5  Student  Quantitative  Resuits 

Based  on  the  values  shown  in  Table  1, 
since  the  estimated  t-value  approaches  the 
table  value  of  1.734,  It  can  be  said  that  the 
average  test  scores  increased  somewhat 
significantly  from  the  pre-test  to  the  post- 
test based  on  the  module  intervention 
(t=1.70,dof=18,p<.05);  therefore,  the 


students  did  improve  statistically  in  their  test 
scores  due  to  the  module  intervention. 

A post-test  questionnaire  was  also  provided 
that  utilized  a Likert  Scale  format.  The 
significant  results  are  provided  below: 

Question:  What  is  your  overall  feeling  about 
the  STAMINA  modules  that  you  participated 
in  for  Chapter  5? 

Answer:  70%  of  the  students  liked  the 
modules. 

Question:  Did  you  like  or  dislike  the 
addition  of  music  to  the  presentation? 

Answer:  70%  of  the  students  liked  the 
addition  of  music. 

Question:  Would  you  consider  these 
modules  to  be  excellent  tools  as  a 
SUPPLEMENT  to  your  regular  classroom 
time;  that  is,  would  these  tools  be 
considered  a great  addition  to  your  regular 
class? 

Answer:  90%  of  the  students  would 
consider  these  modules  as  an  excellent 
supplement  to  the  regular  course. 

Question:  If  these  modules  were  given  as  a 
SUPPLEMENT  to  my  MEC131  course,  I 
would  use  them  to  enhance  my  learning. 

Answer:  80%  of  the  students  would  use  the 
modules  to  enhance  their  learning  of  the 
content  material. 

Question:  In  your  opinion,  could  you  use 
these  modules  as  a STANDALONE  learning 
tool;  that  is,  could  these  modules  be  used 
instead  of  having  a regular  classroom 
environment? 

Answer:  100%  of  the  students  disagreed 
that  this  module  can  be  used  as  a 
standalone  learning  tool. 

Question:  Was  the  user  interface  {Adobe 
Captivate)  in  your  browser  easy  to  load  and 
navigate? 
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Answer:  100%  of  the  students  agreed. 

2.6  Instructor  Qualitative  Results 

Several  comments  were  provided  by  the 
instructors  in  both  the  email  group  and  the 
face-to-face  group  that  provide  excellent 
feedback  for  this  study.  Some  samples  of 
the  comments  are  provided  below; 

7 commend  you  for  developing  the  modules 
listed  below  and  I expect  it  took  you  quite 
some  time  to  complete  them.  I would  be 
interested  to  learn  what  your  plans  are  for 
the  modules  going  forward.  I would  also  be 
interested  to  know  what  textbook  you  are 
currently  using  for  this  course.  ” 

“I  REALLY  liked  your  first  presentation!.. I'm 
going  to  have  to  learn  how  to  create  one  like 
it." 

"Great  work  here!  I like  the  interactive  and 
interesting  visual  and  experiential 
components.  I am  developing  any  helps  for 
my  statics  classes  to  make  it  more 
interactive.  ” 

“Overall  you  have  done  well  with  the 
presentations.  The  main  benefit  to  students 
will  be  with  the  ability  to  review  the  demos 
more  than  once. " 

'Tm  not  sure  how  well  the  pre-assessment 
part  would  work,  but  the  other  stuff  would  be 
good  for  students  who  might  want  some 
passive  learning  expenences.  ’’ 

“Really  like  the  graphics  and  colors  and 
user  interface..." 

“Excellent:  warm-ups,  graphics,  music..." 

3.0  DISCUSSION 

This  study  was  limited  in  that  the  study  was 
only  administered  to  one  group  of  students 
(95%  confidence  level,  confidence  interval 
+/-  30.98).  Also,  the  group  was  not 
supervised  to  ensure  that  all  three  modules 
were  fully  viewed  by  each  of  the  students 
even  though  the  number  of  times  each 


student  accessed  the  modules  was 
catalogued  electronically.  It  was  hoped  that 
the  second  group  of  students  in  the  Fall 

2009  semester  would  have  provided  an 
additional  set  of  data  to  increase  the  validity 
of  this  study,  but  because  of  the  Internet 
connectivity  problems,  this  was  not 
possible.  Due  to  other  commitments, 
course  scheduling,  and  time  constraints,  the 
study  was  not  able  to  be  administered  to 
other  groups  of  students  during  the  Spring 

2010  semester. 

Also,  an  issue  that  is  encountered  by 
engineering  technology  students  Is  that  this 
particular  discipline  has  been  traditionally 
taught  in  a lecture  format  only.  Introducing 
technology  into  these  types  of  courses 
creates  a sort  of  paradigm  shift  in  that  not 
only  do  the  students  have  to  learn  how  to 
use  this  technology  as  an  integral  part  of 
their  learning  process,  but  engineering 
technology  instructors  will  also  need  to  learn 
how  to  incorporate  different  forms  of 
technology  into  their  curriculum  to  help 
make  their  courses  more  robust. 

4.0  CONCLUSION 

What  can  be  concluded  from  this  study  is 
that  students  not  only  liked  the  technology 
that  was  used,  but  prefer  to  have  some  form 
of  technology  to  supplement  their  learning 
experience.  This  agrees  with  the  literature 
review  that  was  previously  provided.  Test 
scores  did  in  fact  improve  significantly,  so  it 
is  possible  that  if  an  instructor  wanted  to 
use  the  module  as  a standalone  tool  for 
implementation  in  a distance  or  hybrid 
version  of  the  course,  then  there  might  be 
some  usefulness  in  doing  so  (even  though 
the  students  who  participated  in  the  study 
were  against  using  it  in  this  fashion).  It  still 
may  be  in  the  best  interest  of  the  instructor 
and  the  students  to  use  tools  like  this  to 
primarily  further  reinforce  concepts  taught  in 
the  classroom.  Giving  students  the  ability  to 
review  the  video  an  unlimited  amount  of 
times  gives  them  further  practice  in 
understanding  the  concepts  that  are 
provided  which  may,  in  turn,  help  better 
prepare  them  for  unit  exams.  This  definitely 
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frees  up  the  instructor  from  having  to  use 
additional  time  in  class  to  go  over  the  same 
concepts  and  puts  the  onus  on  the  student 
to  take  more  responsibility  of  their  own 
learning  experience  (i.e.  being  more 
learner-centered). 

Fellow  instructors  also  liked  how  the  module 
was  designed  and  seemed  encouraged  to 
want  to  try  implementing  some  form  of 
online  tool  in  their  respective  courses. 

These  instructors  provided  helpful  feedback 
that  will  be  used  to  revise  and/or  modify  the 
modules  should  they  be  implemented  again 
for  future  sections  of  the  course. 
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7.1  Leveraging  Gaming  Technology  to  Deliver  Effective  Training 


Leveraging  Gaming  Technology  to  Deiiver  Effective  Training 

James  D.  Cimino 
□2  TEAM-Sim 

iciminQ@d2teamsim.cQrn 

The  best  way  to  engage  a soldier  is  to  present  them  with  training  content  consistent  with  their  learning  preference.  Blended 
Interactive  Multimedia  Instruction  (I  Ml)  can  be  used  to  teach  soldiers  what  they  need  to  do.  howto  do  each  step,  and  utilize  a COTS 
game  engine  to  actually  practices  the  skills  learned,  Blended  IMI  provides  an  enjoyable  experience  for  the  soldier,  thereby 
increasing  retention  rates  and  motivation  while  decreasing  the  time  to  subject  mastery.  And  now  mobile  devices  have  emerged  as 
an  exciting  new  platform,  literally  placing  the  training  into  the  soidier's  hands,  in  this  paper,  we  will  discuss  how  we  leveraged 
commercial  game  engine  technology,  tightly  integrated  with  the  Blended  IMI,  to  train  soldiers  on  both  laptops  and  mobile  devices. 
V\fe  will  provide  a recent  case  study  of  how  this  training  is  being  utilized,  benefits  and  student/instrucfor  feedback. 


1.0  INTRODUCTION 

Motivating  soldiers  to  want  to  learn  can  be  a 
difficult  proposition.  Even  today,  the  majority 
of  training  is  delivered  by  a classroom- 
based,  instructor-led  curriculum.  Classroom- 
based  courses  typically  result  in  one 
dimensional  training,  offering  minimal 
opportunities  to  engage  the  soldier  [6].  Even 
for  simple  training,  one-dimensional 
curriculums  usually  consist  of  a long  list  of 
PowerPoint  slides.  Soldiers  have  a name  for 
this  type  of  training,  “Death  by  PowerPoint.” 
Blended  Interactive  Multimedia  Instruction 
(IMI)  solutions,  incorporating  different  types 
of  media  and  various  levels  of  interaction, 
provide  an  engaging  alternative  to 
traditional  classroom  training.  Blended  IMI 
solutions  have  the  ability  to  motivate 
soldiers  who  require  repetitive  training  for 
collective,  individual,  and  team  performance 
tasks.  This  helps  entice  soldiers  to  learn 
tasks  which  could  otherwise  be  considered 
tedious  or  boring  [6].  Similarly,  deployed 
soldiers  can  be  engaged  to  remain 
proficient  on  individual  and  crew  oriented 
tasks  when  supplied  with  engrossing  tactical 
training  applications  on  a handheld  device. 
Handheld  applications  have  the  potential  to 
be  a powerful  contributor  in  the  process  of 
ensuring  that  soldiers  retain  fundamental 
skills  necessary  for  successful  combat 
operations. 


2.0  OPPORTUNITY 

Have  you  ever  watched  a teenager  with  a 
new  video  game?  They  do  not  read  any 
instructions;  they  simply  load  the  disk  into 
their  game  console  and  start  to  play.  Youths 
learn  without  instructors.  They  learn  through 
experience  and  both  positive  and  negative 
feedback  and  consequences.  Gaming 
technologies  have  the  added  advantage  of 
letting  the  “player”  be  in  control.  First-person 
shooter  games  have  been  around  for 
decades,  and  this  new  generation  of  soldier, 
"Generation  X-Box,”  wants  to  participate  in 
their  training.  They  are  more  comfortable 
with  a video  game  than  sitting  in  a 
classroom  viewing  a PowerPoint  slide  deck 
or  reading  a technical  manual. 

Wth  upwards  of  70,000  new  soldiers 
enlisting  in  the  Army  each  year  [9],  it  is 
imperative  that  every  opportunity  be  taken 
to  maximize  access  to  training  [3].  With 
continuing  operational  deployments,  ready 
access  to  individual  soldier  tasks/collective 
training  (i.e.  crew  drills)  are  equally 
important.  Blended  IMI  allows  for 
improvements  in  the  quality  of  instruction  in 
addition  to  increasing  the  efficiency  of 
creating,  deploying,  and  managing  the 
instruction.  Wth  the  ability  to  cover  the 
widest  array  of  material,  blended  IMI 
solutions  can  be  completely  web- 
deliverable.  Blended  IMI  can  offer  the 
highest  quality  educational  experience  to 
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the  greatest  number  of  students. 
Transferring  this  training  to  tactical 
applications  on  a soldier’s  handheld  device 
means  better  trained  soldiers  fully  prepared 
for  combat. 


3.0  PROOF  OF  THE  PROBLEM 

Digitized  course  material  (e.g.  PDF  files, 
PowerPoint  slides,  Word  Documents)  has 
been  in  use  for  decades.  However,  these 
are  merely  digitized  versions  of  the  existing 
course  material  which  adds  little  to  the 
effectiveness  of  the  instruction.  A blended 
approach  utilizing  multiple  IMI  levels  and 
types,  extracts  value  from  these  materials 
beyond  what  traditional  classroom 
instruction  can  accomplish  [5].  The  most 
significant  benefits  of  blended  IMI  solutions 
stem  from  their  ability  to  transform  the  roles 
of  instructors  and  students.  This 
transformation  allows  for  a reduction  in  the 
amount  of  class  time  needed,  a decrease  in 
the  need  for  travel,  and  a positive  return  of 
investment  for  training  dollars. 


4.0  THE  SOLUTION:  A BLENDED  IMl 
CURRICULUM 

The  use  of  a blended  IMI  curriculum 
provides  benefits  to  four  key  areas  of  the 
modern  educational  process:  student 
interactions,  instructor  interactions, 
accessibility  and  transportability,  and  return 
on  investment. 

4.1  Student  Interaction 

The  best  way  to  reach  and  engage  a soldier 
is  to  present  them  with  content  consistent 
with  their  learning  preference.  In  general, 
most  people  can  be  characterized  as 
learning  more  effectively  from  one  of  three 
types  of  presentation:  aural,  visual,  and 
physical  kinesthetic.  Auditory  learners  are 
more  engaged  by  information  presented  in 
an  aural  format.  Visual  learners  benefit  the 
most  from  ocular  content  such  as  video  and 


text.  Physical  kinesthetic  learners  respond 
better  to  information  disseminated  during 
physical  activity  [2].  Blended  solutions  can 
provide  simultaneous  delivery  of  audio, 
visual,  and  physical  kinesthetic  content 
delivered  on  a single  platform.  An  example 
of  this  is  a video  accompanied  by  voiceover, 
coupled  with  a basic  game-based  simulation 
exercise  designed  to  allow  the  student  to 
apply  the  knowledge  transferred  from 
watching  the  video. 

Advances  in  the  technology  used  to  create 
today’s  multimedia  instruction  allow  all  of 
these  formats  to  be  seamlessly  blended 
together  to  provide  a consistent  experience 
for  the  student.  The  result  is  an  efficiently 
delivered  package  that  has  content  catering 
to  each  style  of  learning.  A significant 
benefit  of  blended  IMI  is  the  reduction  in 
repetitive  material  that  covers  the  same 
topics  in  different  ways  in  order  to  engage 
different  learning  preferences.  This  reduces 
the  time  needed  to  train,  freeing  up  time  for 
additional  courses  or  practical  application 
exercises  [7]. 

Blended  IMI  can  provide  a more  enjoyable 
experience  for  the  soldier,  thereby 
increasing  retention  rates  and  motivation 
while  decreasing  the  time  to  subject 
mastery  [6].  Practical  exercises  can  be 
reproduced  through  constructive  games  [10] 
that  allow  a warfighter  to  engage  in  training 
without  fear  of  failure  or  poor  performance. 
Without  any  performance  anxiety  the  soldier 
is  better  able  to  utilize  the  training  for  the 
acquisition  of  knowledge  and  skills.  These 
constructive  games  can  be  timed  or 
objectively  scored  to  provide  feedback  for 
the  warfighter  and  a means  of  competition 
through  which  soldiers  will  challenge  one 
another.  Competition  is  an  extremely 
effective  motivator  that  is  not  only  free,  but 
entertaining.  Training  can  now  become 
something  that  soldiers  are  interested  in 
doing  on  their  own  time. 

Blended  IMI  empowers  the  warfighter  to 
take  a more  prominent  role  in  their  own 
education  by  providing  them  the  ability  to 
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perform  self  remediation.  Questions 
answered  incorrectly  can  be  linked  to  the 
area  in  the  instructional  content  that 
contains  the  correct  answer.  This  same 
principle  can  be  applied  to  activities  and 
constructive  games,  giving  the  soldiers  the 
option  to  return  to  the  instruction  to  review 
the  procedure  again  before  continuing  their 
activity. 

4.2  Instructor  Interaction 

The  student  is  only  half  of  the  learning 
equation.  The  experience  and  insight  of 
instructors  cannot  be  replaced  and  is  a 
valuable  part  of  the  learning  process. 
Instructors  can  provide  not  only  context  to 
the  learning  materials,  but  also  impart 
valuable  real  world  experience.  Moving  to  a 
blended  curriculum  transforms  the  role  of 
the  instructor  to  one  in  which  they  can  be 
much  more  effective. 

In  the  past,  instructors  had  to  spend  a great 
deal  of  time  doing  little  more  than  providing 
an  audible  version  of  instructional  texts. 

This  problem  can  be  addressed  by  using 
blended  curricula  that  utilizes  a mix  of 
media  types.  Creative  and  interesting  take 
home  and  web-deliverable  applications  take 
the  place  of  traditional  homework,  which 
usually  consisted  of  reading  printed 
materials.  These  enhanced  pre-course  or 
in-course  homework  assignments  increase 
the  amount  of  knowledge  that  students 
enter  class  with.  Therefore,  the  instructor 
can  spend  their  time  in  class  answering 
questions,  discussing  advanced  concepts, 
or  relaying  valuable  personal  experience  [7], 

Having  students  better  prepared  is  not  the 
only  benefit  to  instructors  that  a blended  IMI 
curriculum  can  bring.  Through  the  use  of 
IMI,  instructors  now  have  the  ability  to  use  a 
dashboard- 1 ike  display  to  monitor  the 
progress  of  all  students  in  real-time.  This 
can  be  used  for  instructional  intervention 
when  the  instructor  notices  that  a student  is 
having  issues  with  a particular  concept  or 
step  of  a process.  Instructors  can  now  act  to 
improve  critical  decision  making  skills  or 


respond  to  Infrequent,  yet  important, 
questions  and  scenarios.  For  example,  if  the 
entire  class  is  running  a basic  constructive 
game  of  a certain  maintenance  procedure, 
only  one  of  the  students  may  have  enacted 
the  exact  set  of  circumstances  that  would 
result  in  a rare  safety  issue.  The  instructor 
could  stop  the  class  and  bring  everyone’s 
attention  to  that  student’s  scenario  to  teach 
directly  to  this  point.  An  instructor  also  has 
the  ability  to  focus  on  individual  remediation 
without  interrupting  the  rest  of  the  class. 


5.0  ACCESSIBILITY  AND 
TRANSPORTABILITY 

Full-featured,  blended  IMI  curricula  can  be 
created  to  meet  the  minimum  system 
requirements  for  home  computers  as  set 
forth  by  TRADOC’s  Army  Training  Support 
Center’s  Education  and  Training  Support 
Directorate  [1],  Not  only  will  Army  school 
houses  be  able  to  run  this  content,  but  most 
students  will  be  able  to  access  and  run 
these  courses  from  their  residences  or 
barracks,  at  home  or  abroad.  The  ability  to 
train  anywhere,  anytime,  on  almost  every 
computer  made  available  to  students  is  one 
of  the  most  important  benefits  to  using 
blended  IMI.  Being  able  to  include  this 
training  as  a tactical  application  on  the 
soldier’s  handheld  device  means  he  can 
take  effective  training  material  with  him 
wherever  he  is  assigned  or  whatever  the 
mission. 

With  a blended  IMI  curriculum,  web- 
deliverable  content  allows  for  updates  to 
training  materials  to  be  rapidly  deployed. 
The  modular  nature  of  most  blended 
training  solutions  also  allows  for  course 
designers  to  upload  content.  This  can 
provide  additional  context  or  relevance  to 
soldiers,  further  increasing  engagement  and 
retention  [4],  Required  changes  to  course 
content  can  also  be  made  without  involving 
outside  developers. 

Mobile  devices  are  emerging  as  an  exciting 
new  platform  for  the  delivery  of  training. 
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Many  warfighters  already  have  a mobile 
device  of  some  sort  and  are  familiar  with 
installing  and  using  applications  developed 
for  their  mobile  devices.  Training  created  for 
mobile  devices  can  be  used  as  a 
supplement  to  existing  PC-based  training  or 
for  sustainment  purposes  once  a student 
has  completed  a course.  It  is  important  to 
make  sure  that  continuity  is  maintained 
between  mobile  and  PC  based  versions. 
This  ensures  that  students  are  presented 
with  a consistent  interface  and  presentation 
style.  The  importance  of  maintaining 
consistency  when  updating  content  or 
moving  to  new  platforms  cannot  be 
understated.  Consistency  between  the 
versions  also  maintains  a minimal  learning 
curve  for  the  student.  Any  confusion  caused 
by  a lack  of  consistency  will  be  a detriment 
to  the  soldier's  attitude  towards  the 
instruction. 

Blended  IMI  curricula  afford  the  opportunity 
for  soldiers  to  continue  learning  and 
developing  skills  outside  the  classroom 
through  the  use  of  interactive  leave  behind 
materials.  These  materials  are  developed  in 
conjunction  with  the  blended  IMI  used  in 
class  so  that  the  student  can  apply 
knowledge,  practice  skills,  and  deepen  their 
understanding  of  key  concepts  [4],  Mobile 
versions  of  these  leave  behind  materials 
allow  students  to  use  them  for  additional 
training  or  reference  anywhere  they  are  at 
any  time.  By  literally  placing  the  training  into 
the  soldier’s  hands  enormous  gains  in  the 
warfighter’s  initiative  to  train  can  be 
realized. 


6.0  RETURN  ON  INVESTMENT 

The  replacement  of  any  piece  of  an  IT 
infrastructure  can  be  expensive,  time 
consuming,  and  disruptive  to  normal 
operations.  A blended  IMI  solution  should 
be  designed  with  existing  equipment  in 
mind.  The  key  lies  in  using  the  appropriate 
technologies  to  create  multimedia 
presentations  in  order  to  ensure  their 


compatibility  with  the  widest  range  of 
computer  hardware  possible. 

The  web-deliverable  nature  of  the  content 
also  means  that,  in  cases  where 
appropriate,  classroom  instruction  can  be 
distributed  to  multiple  locations.  This  results 
in  a decrease  in  travel  and  travel -related 
costs  as  well  as  an  increase  in  the  number 
of  soldiers  that  a single  instructor  can 
manage  in  a class.  The  reduction  in  travel 
not  only  means  a direct  cost  savings  but 
also  that  soldiers  will  be  required  to  spend 
less  time  away  from  their  families. 

The  logistical  price  tag  associated  with 
training  on  any  weapons  system  can  be 
staggering.  Examples  of  the  overhead 
associated  with  these  systems  include 
storage,  fuel,  maintenance,  and 
transportation  costs.  Blended  IMI  and 
handheld  applications  lower  these  costs  by 
reducing  to  total  number  of  systems 
required  for  actual  “hands-on”  training. 
Soldiers  can  practice  anytime,  anywhere, 
without  needing  the  actual  equipment.  They 
can  access  instructors/subject  matter 
experts  anywhere  in  the  world  to  get 
questions  answered. 


7.0  THE  SOLUTION 

An  effective  blended  IMI  solution 
encompasses  the  entire  spectrum  of 
student  centered  instruction  from  all  four 
levels  of  IMI  to  the  production  of  each 
individual  multimedia  component.  Each 
course  needs  to  be  evaluated  for  the 
purposes  of  isolating  individual  teaching 
points,  around  which  student  centered 
interaction  can  occur.  Instructor  or  SME 
(subject  matter  expert)  input  also  needs  to 
be  incorporated  to  enhance  the  relevancy  of 
developed  IMI  content.  The  result  is  IMI 
developed  with  instructors  that  can  function 
independently  or  as  a supplement  to  in 
class  training. 

Utilizing  tools  that  allow  for  rapid 
development,  and  Commercial  off-the-shelf 
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(COTS)  software,  the  time  and  cost 
associated  with  course  creation  are  reduced 
significantly.  Effective  blended  IMI  solutions 
should  be  designed  to  function  on  a wide 
array  of  hardware,  resulting  in  a dramatic 
reduction  in  costs  because  of  the  ability  to 
use  existing  computers  and  network 
infrastructure  [1],  These  solutions  should 
empower  Soldiers  to  access  developed 
courseware  from  their  home  PC,  laptop  or 
hand-held  device. 

The  next  generation  of  training  solutions 
should  leverage  commercial  game  engine 
technology,  tightly  integrated  with  the 
Blended  IMI  developed  to  teach  the 
associated  tactical  training  skills.  The 
Blended  IMI  portion  uses  a “crawl  - walk- 
run"  methodology  to  tell  the  soldier  what 
they  need  to  do,  show  them  how  to  do  each 
step,  and  then  utilizes  the  game  engine  to 
allow  them  to  perform  the  skills  learned. 
Blended  IMI  is  used  to  show  the  soldier 
“what  right  looks  like,"  and  how  to  perform 
each  step.  During  this  portion  of  the  training, 
learning  checks  are  introduced  in  the  form 
of  multiple  choice  quiz  questions.  If  a soldier 
gets  a question  wrong  they  have  the 
opportunity  to  link  back  to  the  section  of  the 
training  to  review.  All  aspects  of  the  training 
are  tracked  and  recorded.  Every  step  will  be 
date/time  coded,  and  every  choice  (correct 
or  incorrect)  is  logged  to  aid  in  the  After 
Action  Review  (AAR)  process.  With  this 
innovative  approach,  it’s  easy  to  see  who 
completed  the  training,  skipped  through 
sections  of  the  training,  or  got  specific 
questions  wrong.  If  used  for  New  Equipment 
Training,  an  instructor  can  use  this  AAR 
data  to  assess  the  class  as  a whole,  or  to 
“zero  in”  on  individuals  who  need  more 
attention.  Soldiers  who  demonstrate  higher 
levels  of  proficiency  in  the  Blended  IMI 
training  and  during  game  play  can  be 
“graduated"  out  onto  the  live  equipment 
sooner,  while  the  rest  of  the  class  continues 
to  refresh  via  the  Blended  IMI. 


Figure  2.  Screenshot  ofHandheici  Gaming 
Solution 


Figure  1.  Screenshot  of  PC-based  Blended 
IMI  Solution 


The  gaming  portion  of  the  training  can  be 
timed,  and  should  mirror  the  performance 
standard  that  a soldier  is  expected  to 
achieve  in  order  to  demonstrate  proficiency 
in  executing  the  given  tasks.  The  game  can 
be  developed  such  that  multiple  soldiers 
can  work  together  in  a cooperative  manner, 
much  as  they  do  in  a real  combat  situation. 
The  game  portion  of  the  solution  should 
emulate  the  various  roles  associated  with  a 
given  set  of  tasks,  and  should  allow  the 
soldier  to  pick  their  associated  role. 

As  the  soldier  works  through  the  various 
levels  in  the  game,  he  or  she  should  be  able 
to  suspend  the  game  play  and  review  the 
appropriate  training  video  on  what  is 
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supposed  to  be  done  at  that  specific  step. 
The  soldier  can  then  click  on  a button  to 
return  to  the  game,  and  pick  up  where  they 
left  off. 

In  order  to  improve  the  accessibility  and 
sustainability  of  the  training,  a portable 
version  of  the  game  should  be  developed. 
The  mobile  version  of  the  game  should 
mirror  the  lock  step  sequencing  of  the  pc- 
based  game.  A mobile  version  of  the  pc- 
based  game  allows  a soldier  to  take  the 
game  with  them,  refresh  their  knowledge 
where  ever  they  go,  whenever  they  want. 


8.0  PROOF  OF  THE  SOLUTION:  3-6 
ADA 

My  company,  in  conjunction  with  Raytheon, 
developed  a solution  for  the  3-6  Air  Defense 
Artillery  School  to  create  a prototype 
training  application  for  the  March  Order  and 
Emplace  Crew  Drill.  The  intent  of  the 
solution  is  to  supplement  training  for  the  3-6 
ADA  schoolhouse.  Currently,  the  3-6  Air 
Defense  Artillery  schoolhouse  is  faced  with 
the  challenge  of  having  to  train  soldiers  on 
various  aspects  of  the  Patriot  Launch 
Station  platform,  but  in  some  cases  they 
have  a physical  shortage  of  equipment. 
Another  issue  facing  the  schoolhouse  is 
once  they  get  soldiers  on  the  physical 
equipment,  only  two  soldiers  can  participate 
in  a crew  drill  at  any  one  time,  leaving  the 
remaining  soldiers  to  wait  and  observe  their 
classmates.  Finally,  the  schoolhouse  was 
also  seeking  an  alternative  to  “death  by 
PowerPoint”  for  their  Advanced  Individual 
Training  (AIT). 

This  blended  IMI  solution,  complete  with  an 
interactive  game,  was  developed  to  address 
the  needs  of  the  3-6  ADA  schoolhouse. 

The  game,  called  “Launcher  Dogs:  March 
Order  & Emplace,”  has  been  developed  for 
both  PC  and  mobile  access.  Four  classes  of 
14-Tangos  have  participated  in  the  initial 
pilot  of  the  Blended  IMI  solution  and  played 
the  associated  “game,”  as  have  their 
instructors.  This  solution  has  been  selected 


to  be  part  of  the  Phase  1 of  Connecting 
Soldiers  to  Digital  Applications  (CSDA)  pilot. 
The  audience  for  this  pilot  will  initially  be 
limited  to  soldiers  undergoing  14-T  AIT 
training  at  Fort  Sill.  The  game  is  being  used 
as  part  of  the  training  and  sustainment 
initiatives  for  the  3-6  ADA  schoolhouse.  The 
game  is  being  made  available  within  the 
schoolhouse  barracks,  classrooms,  and  via 
the  Apple  iPhone  mobile  platform. 

Utilizing  Blended  IMI  training  and  the 
“Launcher  Dogs:  March  Order  & Emplace” 
game,  AIT  will  require  less  time  and 
resources  to  teach  the  Crew  Drill,  while 
simultaneously  improving  learning,  training 
proficiency,  interest  and  long  term  retention 
of  skills:  even  in  the  absence  of  tactical 
equipment  (Experiential  Learning).  It  is 
anticipated  that  through  these  training 
efforts,  the  3-6  ADA  will: 

• Decrease  the  time  to  learn  a crew  drill 
by  50% 

• Reduce  instructor  contact  hours  by  50% 

• Improve  training  proficiency  by  25% 

• Decrease  caution  and  safety  violations 
by  25% 

• Decrease  equipment  damage  due  to 
new  operator  fault  by  30% 

• Significantly  increase  interest  and 
training  motivation 

• Significantly  improve  retention  enabling 
certification  at  Soldier’s  first  unit 

• Decrease  maintenance  costs  from 
inexperienced  use  of  tactical  equipment 

During  actual  tests  of  the  training,  soldiers 
repeated  the  game  in  an  attempt  to  “better” 
their  own  time  in  comparison  to  their  battle 
buddy.  The  results  of  this  repetition  directly 
translated  to  an  improvement  in 
performance  and  proficiency  when  the 
soldiers  transitioned  to  the  live  equipment. 
Figures  3 and  4 depict  the  effectiveness  of 
the  training  from  both  the  student’s  and  the 
instructor’s  perspectives. 
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Figure  3.  Student  Feedback  on  IMl 
Effectiveness 


Figure  4.  Instructor  Feedback  on  the 
Effectiveness  of  Blended  IMl 


9.0  CONCLUSION 

According  to  Dr.  Roger  Smith,  CTO  of  PEO 
STRI,  “Time-on-task  is  an  important  part  of 
learning.  The  more  time  you  spend 
rehearsing,  exploring  options,  and  studying 
outcomes,  the  better  you  will  become  at  a 
skill.  Games  can  add  to  that  by  encouraging 
soldiers  to  spend  more  time  learning  a skill” 
[91- 

Games  integrated  into  blended  IMl  solutions 
offer  benefits  to  every  facet  of  tactical 
training.  One  such  benefit  is  an  increase  in 
training  material  quality  resulting  in  soldiers 


who  are  more  engaged.  This  increased 
level  of  engagement  results  in  higher 
retention  levels  and  shorter  time  to  subject 
mastery.  Because  of  IMI’s  web-deliverable 
nature,  soldiers  can  continue  to  gain  these 
benefits  while  deployed.  It  is  recognized  that 
blended  IMl  solutions  can  also  be  leveraged 
to  ensure  all  students  are  prepared  before 
the  first  day  of  a course.  However, 
sustainment  materials  can  keep  information 
fresh  in  a soldier’s  mind  or  be  used  as  a 
reference  after  training  has  been  completed. 
Instructors  also  benefit  from  the  introduction 
of  a blended  IMl  curriculum.  With  more 
prepared  students  they  can  focus  their  time 
on  advanced  concepts  and  relaying  real 
world  experience.  Solutions  can  be 
produced  that  allow  the  instructors  to 
update  content  for  increased  relevancy  and 
accuracy.  Both  Games  and  blended  IMl 
solutions  can  be  designed  for  existing 
computer  hardware  and  handheld  systems 
in  order  to  minimize  the  impact  of  adoption. 
All  these  benefits  add  up  to  cost  and  time 
savings  to  the  DoD,  as  well  as  increased 
educational  quality  and  accessibility  for  the 
warfighter. 
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Introduction 

• Motivating  soldiers  to  want  to  learn  can  be  a 
difficult. 

• Classroom-based  courses  offer  minimal 
opportunities  to  engage  the  soldier 

• Blended  IMI  solutions  can  motivate  soldiers 
- Repetitive  training  for: 

• Collective 

• Individual 

• Teann  performance  tasks 
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Opportunity 


• Have  you  ever  watched  a teenager  with  a new 
video  game? 

• Gaming  technologies  let  the  “player”  be  in 
control. 

• Upwards  of  70,000  new  soldiers  enlist  in  the 
Army  each  year 

• Blended  IMI  solutions  can  be  completely  web- 
deliverable 
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Proof  of  the  Problem 

• Digitized  course  material  have  been  in  use  for 
decades 

• “DEATH  BY  POWERPOINT” 

• Adds  little  to  the  effectiveness  of  the  instruction 

• Limited  interaction 

• Not  the  way  today’s  soidier  wants  to  iearn! 
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^ Blended  IMI 

• A mix  of  media  types 

• Transforms  the  roles  of  instructors  and  students. 

• Extracts  value  beyond  traditional  classroom 
instruction 

• Provides  benefits  to  four  key  areas: 

- student  interactions 

- Instructor  interactions 

- Accessibility  and  transportability 

- Return  on  investment 
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student  interactions 


• Present  content  consistent  with  their  learning 
preference 

• Three  types  of  presentation: 

- Aural 

- Visual 

- Physical  kinesthetic 

• Blended  IMI  empowers  the  warfighter  to  take  a 
more  prominent  role  in  their  own  education. 
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Instructor  interactions 

• Students  better  prepared 

• Instructors  can  spend  more  time: 

- Answering  questions 

- Discussing  advanced  concepts 

- Relaying  valuable  personal  experience 

• Monitor  student  progress  in  real-time 

- Instructional  intervention 

- Focus  on  individual  remediation 


910 


f^MODSSW  WO  BID 

^'“““Accessibility  & Transportability 

• The  ability  to  train  anywhere,  anytime,  on  almost 
every  computer 

• Updates  to  training  materials  can  be  rapidly 
deployed 

• Mobile  devices  are  emerging  as  an  exciting  new 
platform  for  the  delivery  of  trainingAvoid 
wordiness 

- Important  for  training  continuity  between  mobile  and 
PC  based  versions  be  maintained 
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Return  on  Investment 

• Blended  IMI  solution  should  be  designed  with 
existing  equipment  in  mind 

• Reduction  in  total  hours  of  training 

• Reduction  in  total  number  of  instructors 

• Decreases  in  travel  and  travel-related  costs 

• Lower  logistical  costs  by  reducing  to  total 
number  of  systems  required  for  actual  “hands- 
on"  training 
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Case  Study:  3-6  ADA 


A blended  solution 
incorporating  all  4 levels  of 
IMI  with  an  After  Action 
Review  at  the  completion  of 
each  module. 
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' “ Two  Man  Crew  Drill  Prototype 

•Reference  Material 

•Videos  of  Two  Man  Crew  Drills  provided  by 
Raytheon 

•ARTEP-44-635-Drill  Documentation 

•Crawl  - Walk  - Run 

•Conforms  to  ABCS  Style  Guide  for  PEO  C3T 
•Mirrors  ARTEP-44-635-Drill  manual 
•Provides  after-action  review 
•Simulation  utilized  to  engage  student 
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“'“  Two  Man  Crew  Drill  Prototype 

Challenges 

• Limited  access  to  SME 

• Limited  reference  material 

• Prototypes  designed  around  materials  provided 

• Limited  access  to  funding 

Achievements 

• Over  2 hours  of  IMI 

- Video 

- Flash 

- 3D  Models  and  Character  animation 

- Interactive 

- Link-back  to  video  for  refresher 

- After- action  Review 

• Framework  from  which  additional  modules  can  be  developed/deployed 
quickly  and  cost-effectively 
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'“  Two  Man  Crew  Drill  Prototype 

Fort  Bliss  12  March  2009 

* Raytheon  Montana  St  facility 

* 4 NCO’s  from  3-6  ADA 

* Intent  is  to  review  IMI  training,  and  to  determine 

* if  of  sufficient  fidelity  and  accuracy  to  present  to 

* student  test  group. 

Achievements 

* Received  “Go  this  station"  on  our  IMI  training  from  instructors 

* NCO’s  provided  usefui  feedback  and  criticisms. 

- Expressed  a uniform  belief  that  what  we  have  is  an  extremely 
valuable  step  that  they  believe  soldiers  will  gravitate  to. 
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Two  Man  Crew  Drill  Prototype 

Fort  Bliss  13  March  2009 


• Raytheon  Montana  St.  facility 

• 12  Soldiers  from  3-6  ADA 

- No  prior  ''hands  on”  experience  with  the  Patriot  Launch  Station 
hardware, 

- Morning  Session 

• 8 Soldiers  to  be  put  through  our  IMI  instruction 

• 4 Soldiers  to  attend  AIT  conference  training 

- Afternoon  Session 

• Take  solders  out  on  equipment  in  Abernathy  Park 

- IMI  soldiers  on  1®'  Launch  Station 

- AIT  Conference  soldiers  on  2™^  Launch  Station 

• Have  soldiers  demonstrate  what  they  learned 
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“ Two  Man  Crew  Drill  Prototype 

Fort  Bliss  13  March  2009 
Results 

• IMI  test  group  took  to  the  Computer-based 
training  like  ducks  to  water” 

- Needed  minimal  instruction 

- Wanted  to  run  the  training  portions  repeatedly 

• Competition  amongst  soldiers  to  get  the  "Best  Time'’ 

• Soldiers  provided  useful  feedback  and 
criticisms. 


- Enjoyed  the  IMI  Training 

- Felt  they  actually  learned  something 

• At  Abernathy  Park 

~ IMI  test  group  was  able  to  tell  their 
instructors  what  steps  they  needed  to 
perform 

- Control  Group  needed  to  be  told  by  their 
instructors  what  steps  to  perform 
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J Two  Man  Crew  Drill  Prototype 
Data  Points: 

• 75%  soldiers  completed  Sim  “mission”. 

- 75%  soldiers  were  able  to  complete  the  Sim  mission 
on  their  3rd  attempt. 

- 25%  soldiers  completed  training  multiple  times. 

• 75%  soldiers  scored  75%  or  higher  on  the  initial 
assessment. 

• 100%  soldiers  showed  improvements  in  time 
and  assessment  scores. 
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Basic  IMI  Screen 
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Interactive  Game 
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Incorrect  Selection 
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Video  Link-Back  for 
Remediation 


IMI  vs  POI  student  Feedback 
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Expected  Results 


• Decrease  the  time  to 
learn  a crew  drill  by  50% 

• Reduce  instructor 
contact  hours  by  50% 

• Improve  training 
proficiency  by  25% 

• Decrease  caution  and 
safety  violations  by  25% 


• Decrease  equipment  damage 
due  to  new  operator  fault  by 
30% 

• Significantly  increase  interest 
and  training  motivation 

• Significantly  improve  retention 
enabling  certification  at 
Soldier’s  first  unit 

• Decrease  maintenance  costs 
from  inexperienced  use  of 
tactical  equipment 
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Conclusion 


• The  more  time  you  spend  rehearsing,  exploring 
options,  and  studying  outcomes,  the  better  you 
will  become  at  a skill. 

• Games  & Blended  IMI  allows  soldiers  to 
continue  learning  and  developing  skills  outside 
the  classroom 

• Increased  engagement  = higher  retention  levels 
+ shorter  time  to  subject  mastery. 

• Need  to  design  for  existing  computer  & 
handhelds  to  minimize  adoption  impact 
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Questions/Comments 
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7.2  Optimizing  Decision  Preparedness  by  Adapting  Scenario  Complexity 
and  Automating  Scenario  Generation 

Optimizing  Decision  Preparedness  by  Adapting  Scenario 
Complexity  and  Automating  Scenario  Generation 

Robb  Dunne,  Sae  Schatz,  Stephen  M.  Fiore,  Denise  Nichoison 
University  of  Central  Florida,  institute  for  Simulation  and  Training 
rdunne&istMcf.edu.  sschatz(S>ist. ucf.edu.  sfiore&>.ist.ucf.edu 
dnichols(S>ist  ucf.  edu 
Jennifer  Fowrikes 
Chi  Systems,  inc. 
ifowlkes(3>.chisvstems.  com. 

Abstract.  Klein's  recognitfon^primed  decision  (RPD)  framewoi1<  proposes  that  experts  make  decisions  by  recognizing 
similarities  between  current  decision  situations  and  previous  decision  experiences.  Unfortunately,  military  personnel  are 
often  presented  with  situations  that  they  have  not  experienced  before.  Scenario-based  training  (SBT)  can  help  mitigate 
this  gap.  However,  SBT  remains  a chalienging  and  inefficient  training  approach.  To  address  these  limitations,  the  authors 
present  an  innovative  formulation  of  scenario  complexity  that  contributes  to  the  larger  research  goal  of  developing  an 
automated  scenario  generation  system.  This  system  will  enable  trainees  to  effectively  advance  through  a variety  of 
increasingly  complex  decision  situations  and  experiences  By  adapting  scenario  complexities  and  automating  generation, 
trainees  vvill  be  provided  with  a greater  variety  of  appropriately  calibrated  training  events,  thus  broadening  their 
repositories  of  experience.  Preliminary  results  from  empirical  testing  (A/=24)  of  the  proof-of-concept  formula  are 
presented,  and  future  avenues  of  scenario  complexity  research  are  also  discussed. 

1.0  INTRODUCTION 

Decision-making  in  the  military  has  evolved 
significantly  since  the  eras  of  the  phalanx  and 
the  Napoleonic  regiment.  Their  strict, 
hierarchical  command  and  control  is 
inappropriate  for  the  asymmetrical  conflicts 
warfighters  face  now.  Modern  warfighters, 
often  acting  in  small,  distributed  teams,  are 
expected  to  make  numerous,  rapid  decisions 
based  on  ambiguous  information,  all  the  while 
avoiding  conflicts  with  rules  of  engagement, 
missions  orders,  and  commander’s  intent. 

Unfortunately,  lacking  personal  experience  to 
draw  upon,  junior  military  personnel  are  often 
ill-prepared  to  make  such  complex  decisions, 
and  the  outcomes  of  poor  decisions  may  be 
disastrous:  incorrect  actions  engaged, 
unnecessary  risks  taken,  missions 
Jeopardized,  or  casualties  received. 

Military  personnel  faced  with  unfamiliar 
situations  are  at  a dangerous  disadvantage  if 
they  lack  the  necessary  decision-making 
skills.  Unique  situations  are  inherently  risky 
and  dangerous,  fertile  ground  for  poor 
decision-making  [1].  Such  situations  demand 
increased  attention,  while  draining  vital 
cognitive  resources  [2],  and  they  may 
engender  anxiety  that  can  detrimentally 
influence  decision-making  [3]. 


However,  the  more  familiar  a situation,  the 
less  risk  and  danger  involved,  and  the  better 
the  decision-making.  According  to  Klein’s 
recognition  primed  decision  (RPD) 
framework,  decisions  are  made  based  upon 
decision-makers’  available  "pool”  of 
internalized  previously  experienced  situations 
[4],  Simply,  the  more  experiences  and 
situations  individuals  can  draw  from,  the  more 
likely  they  are  to  successfully  navigate 
through  multiple  decision  points.  In  regards 
to  simulation,  however,  more  experiences  do 
not  necessarily  translate  to  a perception  of 
scenario  fidelity.  Multiple  experiences  within 
simulation  that  are  misaligned  to  the  trainee’s 
level  of  experience  may  not  be  perceived  as 
accurately  reflecting  the  complexity  to  be 
found  in  actuality.  In  their  experiments, 
Bradley  and  Shapiro  [5]  found  that  at  extreme 
levels  of  complexity,  when  cognitive  capacity 
was  taxed,  everything  became  more  real  to 
participants.  The  challenge  for  simulation 
then,  is  presenting  the  optimum  level  of 
complexity  to  engender  a sense  of  fidelity. 

Just  as  decision-making  in  the  military  has 
evolved,  so  have  its  methods  of  training. 
Today,  scenario-based  training  (SBT), 
defined  as  the  purposeful  instantiation  of 
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simulated  events  to  create  desired 
psychological  states  [6],  is  a widely  accepted 
instructional  approach.  One  of  the  strengths 
of  SBT  is  the  presentation  of  varied  situations 
that  allow  trainees  to  experience  real-world 
problems  prior  to  engagement. 

Systematically  presenting  training  along  a 
trajectory  similar  to  Bloom’s  revised 
taxonomy  [7],  proceeding  from  declarative 
knowledge  to  higher-order  levels  of 
abstraction  and  creation,  grounds  the  content 
and  delivery  in  well-documented  learning 
theory.  Through  such  training,  personnel  can 
efficiently  and  effectively  learn  to  integrate 
multiple  skills,  cope  with  realistic  distracters, 
practice  their  higher  order  cognitive  skills,  and 
exercise  naturalistic  decision-making  [8]. 

1.1  SBT  Technologies 

Although  SBT  can  be  delivered  in  a variety  of 
ways,  the  remainder  of  this  paper  refers  to 
scenario- based  military  training  delivered 
through  computer-generated  virtual 
environments. 

Currently,  in  the  practice  of  SBT,  scenarios 
are  chosen  for  implementation  by  a trainer 
based  on  training  objectives,  a description  of 
the  scenario,  the  trainees’  level  of 
experience,  and  timeline  requirements. 
Although  training  manuals  may  present 
sequences  of  training,  they  typically  lack 
explicit  recommendations  for  the  content  or 
sequencing  of  training  scenarios.  Thus, 
without  a clear  progression  of  scenarios, 
trainers  rely  on  their  own  Judgment  when 
determining  the  scenarios’  tasks  and  the 
order  of  presentation  of  scenarios  within  a 
set.  Consequently,  a trainer’s  sequencing 
may  not  align  with  trainees’  levels  of 
experience  or  performance,  and  mismatching 
trainees  with  training  events  can  result  in 
diminished — or  even  negative — decision 
preparedness. 

Recent  efforts  to  develop  a SBT  system  that 
adapts  to  trainees’  levels  of  experience  have 
been  undertaken  by  the  authors,  who  are 
attempting  to  create  automated  methods  that 
more  effectively  advance  trainees  through  a 


variety  of  increasingly  complex  training 
scenarios. 

To  achieve  this  goal,  the  investigators 
needed  to  operationalize  the  notion  of 
scenario  complexity.  That  is,  the  authors 
needed  to  objectively  define  the  subjective 
idea  of  scenario  difficulty.  This  objective 
formulation  is  necessary  in  order  to  develop 
the  software  algorithms  that  perform  the 
automated  instructional  adaptation. 

In  the  following  section  the  authors  present  a 
brief  definition  of  scenario  complexity,  identify 
and  describe  each  of  the  characteristics  used 
in  the  calculation  of  scenario  complexity,  and 
discuss  the  role  of  scenario  complexity  in  the 
automation  of  scenario  generation. 

2.0  SCENARIO  COMPLEXITY 

To  ensure  trainees  receive  scenarios 
appropriate  to  their  experience  level,  it  is 
crucial  to  objectively  define  and  instantiate 
scenario  complexity  so  that  computer-based 
training  software  can  automatically  assemble 
appropriate  SBT  sequences.  Successful 
instantiation  depends  on  taking  the  subjective 
and  abstract  and  making  it  objective  and 
concrete:  that  is,  creating  an  objective 
computational  metric  of  the  subjective  notion 
of  difficulty. 

The  authors  define  scenario  complexity  as 
the  objective  quality  of  a scenario,  which 
interacts  with  individual  characteristics  (such 
as  trainees'  expertise)  to  yield  an  individual’s 
perception  of  the  scenario’s  difficulty  [9]. 

Most  importantly,  scenario  complexity  is 
calculated  based  upon  three  scenario 
elements  that  are  extrinsic  from  rather  than 
intrinsic  to  trainees:  task  complexity,  task 
framework  and  cognitive  context  moderators. 
To  be  clear,  the  authors  purposefully  refrain 
from  attempting  to  incorporate  individual 
perceptions.  Subjective  interpretation  of  a 
task’s  difficulty  or  an  individual’s  affective 
state  in  relation  to  a particular  characteristic  is 
un-actionable  and  cannot  be  calculated.  It  is 
for  the  purpose  of  operationalizing  and 
incorporation  into  program  software  that 
objective  calculation  is  pursued.  For  detailed 
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description  of  the  formula  see  Dunne  et  ai. 
[10]. 

2.1  Scenario  Complexity 
Characteristics 

The  authors'  computational  definition  of 
scenario  complexity  is  as  follows: 

SC={TC+TF)*  CCM  Eq.  (1) 

where  SC  = scenario  complexity.  TC  = Task 
complexity,  TF  - Task  framework,  and  CCM 
= Cognitive  context  moderators. 

Each  of  the  variables  that  comprise  the  total 
SC  can  be  manipulated  to  increase  or 
decrease  the  total  SC.  In  addition,  they  can 
be  altered  to  maintain  the  trainee  in  the  same 
complexity  range  while  at  the  same  time 
presenting  variety  within  the  scenario.  Each 
variable  is  calculated  through  individual 
functions  that  involve  the  following  sub- 
variables. 

2.1  Task  Complexity 

The  task  complexity  component  is  subdivided 
into  component  complexity  and  coordinative 
complexity. 

2.1.1  Component  Complexity 

The  component  complexity  characteristic  is 
composed  of  three  sub-  variables:  the 
number  of  subtasks,  required  acts,  and 
information  cues. 

First,  component  complexity  considers  the 
number  of  subtasks.  Each  training  scenario 
is  designed  around  at  least  one  task  with 
attendant  learning  objective(s).  A task  may 
stand  alone,  without  a sub-task,  but 
frequently,  tasks  include  sub-tasks  that  must 
also  be  performed. 

Second,  each  subtask  requires  one  or  more 
specific  acts.  Although  a required  act  is 
principally  a pattern  of  behavior,  novice 
trainees  internalize  the  conscious  choice  to 
engage  in  the  behavior  until  it  becomes 
automatic.  Increasing  the  number  of  required 
acts  presents  greater  opportunity  for 


transitioning  conscious  behavior  to 
unconscious,  automatized  decisions. 

Finally,  each  subtask  may  require  monitoring 
of  information  cues.  An  information  cue  is  a 
discrete  source  of  information  that  must  be 
monitored  and/or  processed  from  the 
environment.  The  trainee  who  is  aware  of 
these  cues  and  chooses  to  monitor  them  will 
attain  the  desired  performance  more 
efficiently  than  a trainee  who  does  not. 

2.1.2  Coordinative  Complexity 

Coordinate  complexity  is  concerned  with  the 
integration  of  subtasks  and  associated  acts, 
which  may  be  necessary  for  successful  task 
completion.  These  subtasks  are  integrated 
and  involve  synchronization  of  activities  to 
achieve  the  common  goal  or  objective  [11], 

Without  coordination  of  these  subtasks  and 
acts,  trainee  performance  will  suffer.  By 
manipulating  the  degree  of  integration, 
trainees  are  presented  with  increasing  levels 
of  scenario  complexity  requiring,  in  turn,  a 
greater  number  of  decision  points. 

2.2  Task  Framework 

Task  framework  accounts  for  the  relation 
between  task  paths  and  the  outcome 
associated  with  each,  and  it  addresses  which 
outcomes  are  possible  in  a given  task  [12]. 

The  authors  suggest  it  is  the  task  framework 
characteristic  where  the  interplay  of  decision 
preparedness,  performance  and  complexity  is 
most  acute.  Tasks  such  as  those  with  a 
single  goal  and  a single  means  or  path  to 
achieve  that  goal,  are  well-defined.  Tasks 
with  multiple  goals  with  several  possible 
means  or  paths  to  achieve  the  goals  are  ill- 
defined  tasks.  Deciding  if  a particular  means 
or  path  will  achieve  the  desired  goal  requires 
a resolution  of  existing  ambiguity;  a 
calculation  of  potentiality  for  each  path’s 
success.  Ambiguity  and  complexity  make  it 
difficult  for  decision-makers  to  determine 
what  the  possible  outcomes  might  be,  let 
alone  the  value  they  assign  to  them  [13], 
Increasing  ambiguity  and  complexity  is 
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therefore  conjectured  to  be  highly  influential 
in  advancing  decision  preparedness. 

2.3  Cognitive  Context  Moderators 

Cognitive  context  moderators  address  factors 
that  often  increase  stress  and  distraction 
present  in  the  scenario  and  are  defined  as 
external  stimuli  that  affect  the  operator  by 
increasing  load  and  reducing  cognitive 
resources  for  the  task,  thus  causing  less 
complex  decisions  to  appear  more  complex. 

These  moderators  can  influence  the 
resolution  or  quality  of  the  evidence  available 
for  supporting  judgments.  According  to 
Macmillan  and  Creel  man  [14]  \when 
attempting  a forced-choice  judgment  to 
identify,  for  example  “known  or  unknown”, 
performance  will  be  superior  during  a clear 
daylight  encounter  compared  to  night  or 
under  hazy  conditions  because  the  available 
evidence  is  superior. 

It  must  be  reiterated  that  these  characteristics 
are  built  into  the  scenario;  it  remains  the 
decision  of  the  trainee  to  engage  in  these 
acts,  monitor  cues  and  satisfy  scenario 
criteria.  Through  presentation  of  increasing 
levels  of  complexity,  the  scenarios  contribute 
to  the  growing  pool  of  experiences  from 
which  the  trainee  will  draw. 

The  authors  suggest  automated  generation  of 
scenarios  that  take  trainees  on  an  efficient 
and  effective  trajectory  towards  optimized 
decision  preparedness  can  be  accomplished 
by  this  operationalization  of  scenario 
complexity.  To  ensure  that  adaptive 
generation  results  in  positive  outcomes, 
systematic  implementation  of  this  training 
framework  must  be  grounded  in  decision- 
making and  learning  theory. 

The  following  section  describes  two  major 
theories  adaptive  scenario  generation  draws 
upon  to  increase  decision  preparedness. 


3.0  OPTIMIZING  DECISION 
PREPAREDNESS 

The  ability  to  make  timely,  appropriate,  and 
effective  decisions  is  an  essential 
competence  for  warfighters  [15].  Buch  and 
Diehl  [15]  found  that  increases  in  the  quality 
of  decision-making  have  largely  been  by- 
products of  in-field  experience.  However, 
they  concluded  that  judgment  and  decision- 
making capability  can  be  improved  through 
training,  incorporating  situation-specific 
exercises  and  increasing  the  variety  of 
variables  as  training  progressed. 

With  an  objective  value  for  each  scenario’s 
complexity,  variables  can  be  manipulated  to 
increase  or  decrease  variety  and  complexity. 
This  manipulation  must  be  calibrated  to 
trainee  performance,  aligning  the  level  of 
complexity  associated  with  a scenario  to  the 
instructional  needs  of  trainees.  However,  in 
order  to  align  the  scenario  to  an  individual’s 
training  needs — making  the  scenario  that  is 
“just  right”  — the  simulation  must  employ  a 
systematic  instructional  methodology  [16]. 

The  following  section  describes  how  the 
instructional  methodology,  based  on  Bloom’s 
taxonomy,  is  supported  through  the 
operationalization  of  scenario  complexity  for 
SBT.  Also  discussed  is  the  role  of  automated 
scenario  generation  and  how  it  is  utilizes 
Klein’s  recognition-primed  decision-making 
framework  and,  by  extension,  Klein  and 
Baxter’s  cognitive  transformation  theory  [17] 
to  improve  decision  preparedness.  The 
section  also  describes  the  well-documented 
instructional  efficacy  of  Vygotsky’s  zone  of 
proximal  development  [16],  used  to  enable 
proper  alignment  and  sequencing  of 
scenarios. 

3.1  Recognition-primed  Decision- 
making (RPD) 

Recent  decision-making  theories  have 
focused  on  decisions  that  are  made  in 
complex  situations  with  high  stakes, 
uncertainty  and  time  pressure  [13]. 
Naturalistic  decision-making  theory  (NDM) 
attempts  to  explain  such  decisions  by 
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suggesting  that  a decision-maker  continually 
assesses  a situation  in  order  to  recognize 
familiar  characteristics  and  make  judgments 

[19] ,  NDM  has  been  observed  across 
domains,  such  as  firefighting  and  the  military 

[20] ,  v\/here  the  decision-maker  initially 
assesses  the  situation,  looks  for  familiar 
patterns  or  prototypes,  determines  which 
goals  make  sense,  identifies  the  relevant 
cues  to  expect,  and  determines  what  action 
should  will  be  most  appropriate. 

Under  the  umbrella  of  NDM  research,  the 
RPD  model  has  gained  significant  influence 
over  the  past  10  to  15  years.  This  model  is 
based  on  the  supposition  that,  in  complex 
situations,  humans  usually  make  decisions 
based  on  the  recognition  of  similarities 
between  the  current  decision  situation  and 
previous  decision  experiences.  Simply,  Klein 
proposes  that  decisions  are  made  based  on 
the  recall  of  the  consequences  of  previous 
decisions  made  in  similar  situations. 

Cognitive  studies  have  shown  that  over  95% 
of  human  decisions  conform  to  the  RPD 
model  in  time-stressed  situations  [21] — the 
very  type  of  situation  frequently  encountered 
by  military  personnel. 

An  extension  of  RPD  is  cognitive 
transformation  theory  [CTA].  This  theory 
states  that  as  novices  progress  towards 
expertise  they  develop  “knowledge  shields” 
that  serve  to  protect  their  established 
concepts.  These  shields  can  negatively 
affect  knowledge  and  skills  acquisition  and,  in 
the  operational  theater,  may  lead  to  situations 
where  default  decisions  are  made  based  on 
biased  judgment  rather  than  experience  [17]. 

However,  Klein  and  Baxter  reference  Waller, 
Hunt,  and  Knapp  [22]  who  found  that  varied 
and  sufficient  exposure  to  virtual  training 
environments  provides  trainees  with  the 
needed  practice  to  construct  valid  mental 
models  and  alleviate  the  obstacles  created  by 
such  knowledge  shields  [17]. 


3.2  Zone  of  Proximal  Development 
(ZPD) 

The  theory  of  the  zone  of  proximal 
development  (ZPD)  is  the  scientific  basis  for 
why  training  (in  this  case,  scenario)  difficulty 
must  be  appropriately  matched  to  trainees’ 
current  skill  levels.  The  ZPD  is  the  range 
within  which  learning  and  training  tasks  are 
neither  too  hard  nor  too  easy.  According  to 
Vygotsky,  development  will  only  occur  when 
a trainee  is  confronted  by  a task  that  lies 
within  the  zone,  because  if  a task  is  too  easy 
then  no  development  will  happen  (although 
gains  in  fluency  and  accuracy  may  occur 
simply  through  repetition)  [23]  and  if  a task  is 
too  difficult  to  complete  successfully,  no 
cognitive  development  will  occur  [24]  and 
motivation  may  suffer. 

3.3  Implementation 

For  illustrative  purposes  let  us  say  during 
SBT,  a trainee  performs  exceptionally  well. 
They  have  attained  the  goals  and 
successfully  navigated  decision  points. 
Another  trainee  has  not  performed  so  well. 
They  have  not  attained  the  goals,  and  their 
decisions  were  inadequate.  Following  the 
principle  of  ZPD  the  next  scenario  in 
sequence  should  be  neither  too  difficult  nor 
too  easy.  Due  to  such  differing  performance, 
under  today’s  SBT  approach,  it  cannot  be 
expected  that  one  scenario  will  address  the 
learning  needs  of  both  trainees.  However,  if 
the  next  scenario  in  sequence  could  increase 
in  complexity  for  the  high  performer  while 
either  maintaining  the  current  complexity 
level,  with  variation,  or  remediating  at  a lower 
complexity  range  for  the  other  trainee,  then 
the  training  should  be  more  effective  and 
efficient.  Achieving  this  is  the  authors’  goal. 

Scenario  complexity  ranges  derived  from  the 
computational  formula  developed  by  the 
authors,  allow  software  to  determine,  with  or 
without  manual  input  from  the  trainer,  the 
appropriate  sequence  of  experiences  for  both 
of  these  trainees.  If,  according  to  ZPD, 
negative  training  exists  outside  both  the 
upper  and  lower  bounds,  then  scenario 
complexity  defines  the  upper  and  lower 
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ranges  of  the  complexity  of  the  appropriate 
sequenced  scenario. 

Aligning  scenario  sequencing  to  trainee 
performance,  therefore,  ensures  that  trainees 
are  not  presented  with  decision  points  and 
situations  which  are  too  hard  or  too  easy,  but 
remain  in  the  area  of  ideal  learning,  while 
presenting  varied  situations  and  experiences 
increasing  decision  preparedness. 

To  establish  empirical  basis  for  this  theory  of 
scenario  complexity  an  initial  study  is  being 
conducted.  The  focus  of  this  study,  as  well 
as  its  design,  methodology,  and  preliminary 
findings  are  presented  in  the  following 
section. 

4.0  EXPERIMENTATION 

A pilot  study  was  conducted  to  ascertain  if  the 
researchers’  theory,  equations,  and 
hypotheses  are  usable  and  conceptually 
sound.  Incorporating  lessons  learned  from 
this  pilot  study,  any  reformulation  to  the 
original  equation  that  is  suggested  by  the 
results  will  precede  further  research.  The 
end-state  study  will  include  construction  of 
dynamic  scenario-based  training  of  varying 
levels  of  complexity  designed  by  military 
personnel. 

4.1  Design  and  Methodology 

This  pilot  study  involved  a single  sample 
group  (A/=24)  comprised  of  undergraduate 
students  recruited  from  a large  southeastern 
university.  Each  participant  completed  four 
surveys,  which  asked  participants  to  indicate 
their  perception  of  the  complexity  levels  of 
various  situations.  These  situations’ 
complexity  levels  were  calculated  a priori  by 
the  authors,  using  the  described  scenario 
complexity  formula. 

The  first  series  of  6 questions  asked  the 
participants  to  indicate  on  a scale  of  1-20  how 
simple  or  complex  they  thought  it  would  be  to 
drive  a car:  (Qi)  in  an  empty  parking  lot,  {Q2) 
to  a familiar  destination  in  light  traffic,  (Qs)  to 
an  unfamiliar  destination  in  a familiar  city, 

(Q4)  to  an  unfamiliar  destination  in  an 
unfamiliar  city  using  a map,  (Qs)  to  a familiar 


destination  in  heavy  traffic  and  severe 
thunderstorm,  and  (Qe)  to  an  unfamiliar 
destination  using  a map,  in  light  traffic  and 
mild  rain. 

The  second  series  of  6 questions  asked  the 
participants  to  indicate  how  simple  or 
complex  they  thought  it  would  be  to:  juggle 
three  balls  of  different  sizes  (Q7),  juggle  two 
balls  and  walk  at  the  same  time  (Qs),  juggle 
two  balls  (Qg).  juggle  three  balls  of  the  same 
size  and  walk  (Qio),  juggle  three  balls  of  the 
same  size  while  reciting  the  ABC’s  (Qn),  and 
juggle  two  balls  of  different  sizes  and  walk 
while  reciting  the  ABC’s  (Qi2)- 

The  third  series  of  6 questions  asked  the 
participants  to  indicate  how  simple  or 
complex  they  thought  it  would  be  to  drive  a 
car;  in  an  empty  parking  lot  while  talking  on 
the  phone  (Q13),  to  a familiar  destination  in 
light  traffic  while  talking  on  the  phone  (Qu),  to 
an  unfamiliar  destination  while  talking  on  the 
phone  (Qis),  to  an  unfamiliar  destination  with 
a map  while  talking  on  the  phone  (Qis).  to  a 
familiar  destination  in  heavy  traffic  and  severe 
thunderstorm  while  talking  on  the  phone  (Q17) 
and,  to  an  unfamiliar  destination  with  a map 
in  light  traffic  and  mild  raid  while  talking  on 
the  phone  (Qis). 

The  fourth  series  of  questions  set  the 
participants  into  a single  scenario  comprised 
of  7 different  situations  and  were  asked  to 
indicate  how  simple  or  complex  they  thought 
each  situation  would  be  to;  drive  to  their 
friends  house  {Qig),  drive  to  their  friends  new 
house  where  they’ve  never  been  before  {Q20), 
surf  (Q21),  play  a ping-pong  toss  game  (Q22), 
play  the  toss  game  with  a fan  aimed  out  over 
the  game  (Q23),  choose  from  a large  menu  of 
pizza  toppings  under  a time  constraint  (Q24) 
and  to  pick  one  of  two  sodas  {Q25). 

4.2  Hypotheses 

The  authors  hypothesized  that  the  scenario 
complexity  computation  calculated  by  the 
authors  would  be  similar  to  the  results 
identified  by  the  participants.  In  other  words, 
the  authors  expected  that  the  formula  would 
yield  relatively  comparable  results,  regardless 
of  who  assessed  the  described  scenarios. 
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4.3  Results 

An  independent  one-sample  Mest  was 
conducted  on  the  participant  (n=24) 
responses  to  evaluate  whether  their  means 
were  significantly  different  from  the  test 
values  established  a priori. 


In  the  graph  below  the  a priori  complexity 
levels  and  participants’  responses  are 
illustrated.  The  light  line  represents  the 
calculated  levels  while  the  dark  line 
represents  the  mean  levels  of  the  aggregated 
participant  responses. 


Z-scores  were  also  derived  to  ascertain  the 
degrees  of  agreement  among  participants’ 
values.  Z-scores  between  -1.249  and  1.249 
indicate  relative  agreement  as  to  a situation’s 
difficulty.  Z-scores  < -1.25  or  > 1 .25 
represent  significant  disagreement. 

The  following  table  shows  the  2-scores  for 
each  item.  The  total  mean  and  standard 
deviation  are  noted  below. 


Calculated  Z-scores* 
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II 

b 

o 

1.22 

19 

1.7 

6 

p=  .00 

-1.69 

20 

13 

8.6 

p=  .00 

.52 

21 

13.66 

20 

p = .00 

.65 

22 

8 79 

2 

p = .00 

^.31 

23 

13.2 

6.6 

p=  .00 

.56 

24 

4.95 

3.3 

p = .11 

-1.06 

25 

3.29 

1 

p^  .00 

-1.39 

Table  ' 

1:  Calculated  Z-scores,  means,  and  f-test 

results;  *N  = 24,  95%  confidence. 


Expected  levels  and  participant  response 
means 


Figure  1:  Expected  and  participant  complexity 
levels 


• Expected 


•Response 

mean 


4.4  Conclusion 

Two  salient  conclusions  can  be  drawn  from 
the  results  of  this  study.  First,  Z-scores 
indicate  significant  disagreement  at  both  ends 
of  the  complexity  spectrum,  suggesting  that 
the  low-  and  high-end  complexity  scenarios 
have  a wider  range  of  responses  than  do  the 
middle-ground  values.  Second,  there  was  a 
significant  level  of  agreement  of  perceived 
complexity  relating  to  those  situations  which 
occupy  the  middle-ground. 

5.0  DISCUSSION 

These  results  indicate  areas  of  both  promise 
and  challenge.  As  a proof  of  concept,  the 
authors  believe  their  formalization  of  scenario 
complexity  is  headed  in  the  correct  direction; 
however,  there  are  adjustments  and 
considerations  which  must  be  accounted  for 
in  future  iterations.  Attempting  to  quantify 
subjective  perceptions  is  rife  with 
inconsistency.  Two  participants  may  give  two 
different  values  for  the  same  situation  even 
though  they  both  perceive  the  situation  as 
being  very  simple.  Further,  levels  of 
experience  may  have,  as  Klein  suggests, 
played  a discriminating  role.  Participants  with 
a larger  repository  of  experiences  to  draw 
from  may  have  perceived  the  described 
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situations  with  a lower  level  of  difficulty 
compared  to  those  with  fewer  experiences. 
This  argues  not  only  for  the  support  of  RPD 
but  also  for  the  necessity  to  adapt  training 
and  complexity  levels  to  the  trainee  within 
their  zone  of  proximal  development  or  risk  de- 
motivation and  inefficient  training. 

Additionaliy,  disparity  between  participant 
responses  and  the  a priori  calcuiated  vaiues 
may  point  to  participants’  over-confidence 
and/or  a lack  of  understanding  of  the 
characteristics  of  the  situation.  That  is,  while 
the  scenario  complexity  formula  takes  into 
account  such  characteristics  as  number  of 
task  outcomes  and  cognitive  context 
moderators,  participants  may  not  be  abie  to 
readiiy  identify  such  attributes.  Simiiariy, 
participants  may  identify  such  factors,  but 
may  not  consider  them  impactful.  For 
instance,  in  the  item  that  asked  participants 
for  an  evaiuation  of  their  ability  to  utiiize  a 
cei i-phone  while  in  traffic  during  a thunder 
storm,  over-confidence  may  lead  some  to 
inaccurately  assess  the  challenge  of 
performing  under  such  conditions. 

Lessons  from  this  pilot  study  suggest  further 
refinement  of  both  the  formulation  of  scenario 
complexity  as  proposed,  and  of  the  research 
design.  In  order  to  investigate  validity  and 
reliability  across  the  entire  spectrum  of 
complexity,  future  research  requires 
recalculation  of  the  weight  given  to  each 
characteristic,  and  reformulation  of  the  task- 
framework  equation  in  particular,  in  addition 
to  controlling  for  levels  of  experience  and 
including  more  detailed  instructions  for 
participants. 

Branching  outward  from  this  avenue  of 
investigation,  promising  areas  of  research 
include,  but  are  certainly  not  limited  to, 
investigating  the  role  of  cognitive  load  on 
scenario  complexity.  That  is,  does  objective 
calculation  of  complexity  adequately  address 
both  a novice  and  expert  trainee’s  different 
requirements  of  cognitive  load?  Second,  in 
respect  to  integration  of  team  members,  how 
do  multiple  agents  impact  an  individual’s 
performance  in  dynamic,  adaptive  scenarios? 


Third,  to  what  degree  is  perception  of 
scenario  fidelity  affected  by  increasing 
scenario  complexity  following  Vygotsky’s 
ZPD?  Finally,  what  role  do  personal 
characteristics  such  as  metacognition  and 
self-efficacy  play  in  trainees’  performance? 
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Soft  skills,  also  called  "people  skills,"  are  typically  hard  to  observe,  quantify  and  measure.  These  skills  have  to  do  with  how  we  relate 
to  each  other:  communicating,  listening,  engaging  in  dialogue,  giving  feedback,  cooperating  as  a team  member,  solving  problems 
and  resolving  conflicts.  Most  of  the  soft  skillstraining  is  scenario  based,  utilizing  written  or  video-based  scenarios,  with  limited  or  no 
branching,  as  well  as  quantitative  feedback.  This  paper  will  outline  a game-based  approach  to  configurable,  scenario-based,  soft 
skillstraining.  The  paper  will  discuss  the  application  of  realistic  visual  behavior  cues  (e.g.  body  language,  vocal  inflection,  facial 
expressions)  and  how  these  can  benefit  the  learner.  Using  the  concept  of  a "virtual  vignette,"  this  paper  will  discuss  a prototype 
system  intended  to  teach  suicide  prevention  and  provide  qualitative  feedback  to  the  learner.  The  paper  will  also  explore  other  soft- 
skills  training  applications  for  this  technology. 


1.0  INTRODUCTION 

By  definition,  soft  skills  are  more  than  just 
tangible  facts.  According  to  Konopka  and 
Dupre,  while  it  is  difficult,  it  is  not  impossible 
to  do  "soft  skill”  development  in  non- 
traditional,  non-face-to-face  settings.  In  fact, 
there  may  actually  be  real  advantages  to 
doing  it  “from  a distance”  [2].  This  paper  will 
present  an  approach  to  using  gaming 
technology  to  teach  soft  skills,  with  a focus 
on  teaching  suicide  prevention,  and  looking 
at  other  potential  applications. 

The  US  Army  [7]  has  provided  a number  of 
resources  to  teach  its  leadership  and 
soldiers  on  how  to  recognize  and  properly 
deal  with  a potential  suicide.  The  mission  of 
the  Army  Suicide  Prevention  website  [6]  is 
to  improve  readiness  through  the 
development  and  enhancement  of  the  Army 
Suicide  Prevention  Program  policies.  These 
policies  are  designed  to  minimize  suicide 
behavior,  thereby  preserving  mission 
effectiveness  through  individual  readiness 
for  Soldiers,  their  Families,  and  Department 
of  the  Army  civilians.  A major  component  of 
the  training  related  to  this  Suicide 
Prevention  Program  focuses  on  "soft  skills." 

2.0  SOFT  SKILLS 

The  US  Office  of  Personnel  Management 
[1]  has  identified  an  overlapping  set  of  core 
competencies  for  executive  development. 


They  are  Leading  Change,  Leading  People, 
being  Results  Driven,  having  Business 
Acumen,  and  Building  Coalitions.  Within 
these  competencies  are  subsets  of  soft 
skills  that  could  be  considered  relevant  for 
Suicide  Prevention  “soft  skills”  training: 

Leading  Change 

• Flexibility 

o Is  open  to  change  and  new  information; 
rapidly  adapts  to  new  information, 
changing  conditions,  or  unexpected 
obstacles. 

Leading  People 

• Conflict  Management 

o Encourages  creative  tension  and 
differences  of  opinion.  Anticipates  and 
takes  steps  to  prevent  counter- 
productive confrontations.  Manages  and 
resolves  conflicts  and  disagreements  in 
a constructive  manner. 

Results  Driven 

• Decisiveness 

o Makes  well-informed,  effective,  and 
timely  decisions,  even  when  data  is 
limited  or  solutions  produce  unpleasant 
consequences:  perceives  the  impact 
and  implications  of  decisions. 

• Problem  Solving 

o Identifies  and  analyzes  problems; 
weighs  relevance  and  accuracy  of 
information;  generates  and  evaluates 
alternative  solutions;  makes 
recommendations. 

Building  Coalitions 

• Influencing/Negotiating 


929 


o Persuades  others;  builds  consensus 
through  give  and  take;  gains 
cooperation  from  others  to  obtain 
information  and  accomplish  goals. 

Every  one  of  these  soft  skills  could  be 
utilized  when  dealing  with  a soldier  who  is  a 
potential  suicide  risk. 

3.0  CHALLENGES 

According  to  Lauren  Smith  [3]  there  are 
several  challenges  to  developing  effective 
soft  skills  training.  A common  mistake  is  to 
limit  interactivity.  Web-based  training  that 
consists  mostly  of  text,  graphs  and  simple 
images  tend  to  fall  short  when  it  comes  to 
the  transfer  and  application  of  the 
knowledge.  Material  in  a format  that 
encourages  participants  to  consider  how  to 
apply  a skill  in  a variety  of  situations  and 
contexts  will  not  help  participants  actually 
learn  the  skill. 

Another  challenge  is  to  make  abstract  soft 
skills  concepts  more  concrete  and  tangible 
by  providing  the  appropriate  steps, 
definitions,  illustrations,  and  relevant 
examples  of  how  to  apply  them. 

It  is  important  to  allow  participants  to 
evaluate  their  current  behavior  and  to  gauge 
their  proficiency.  The  challenge  is  to  provide 
relevant  scenarios  that  illustrate  both  poor 
and  excellent  use  of  the  associated  soft 
skill.  Students  may  have  trouble  retaining 
general  concepts,  but  they  will  remember 
the  scenarios  if  they  can  relate  to  them. 

Keeping  the  students  interested  is  another 
challenge  to  delivering  effective  soft  skills 
training.  Students  are  more  inclined  to 
practice  and  learn  the  information  presented 
in  a training  program  when  they  are 
engaged.  This  used  to  mean  utilizing  a 
combination  of  different  types  of  interactive 
multimedia  such  as  audio,  video,  graphics, 
animations  and  games  into  the  content. 
Content  that  simply  emulates  the  standard 
“Death  by  PowerPoint”  will  not  engage 
today’s  young  learners. 


To  make  soft  skills  training  effective,  one 
would  need  to  provide  expert  feedback  to 
participants  throughout  in  order  to  make 
them  aware  of  their  progress.  Immediate 
feedback  is  important  to  learning  and  should 
be  incorporated  into  all  web-based 
exercises  where  possible. 

4.0  ROLE  PLAYING 

According  to  Charles  Green  [1 1],  soft  skills 
training  comes  in  three  forms:  role  plays, 
video  replays  and  case  discussions.  The 
concept  of  using  role  playing  techniques  to 
teach  soft  skills  is  not  a new.  According  to 
Green,  there  is  no  substitute  for  realistic 
“muscle  memory”  activity  when  it  comes  to 
learning  soft  skills.  Role  playing  allows  one 
to  present  more  complex,  hypothetical 
scenarios  [11]. 

An  example  of  this  is  a recent  program  [9] 
designed  to  be  used  by  trained  speakers. 
The  goal  was  to  offer  doctors  a look  at  how 
their  treatment  decisions  affect  long-term 
outcomes  for  patients  with  type  2 diabetes 
through  3-D  video  animations.  In  this 
software  solution,  a doctor  would  role-play 
with  a virtual  patient.  These  virtual  patients 
were  created  such  that  the  instructing 
physician  could  alter  the  characteristics  of  a 
selected  patient  in  terms  of  their  weight,  age 
and  A1C  levels. 


• » virtual  patient 
diabetes  education 
program 


FIGURE  1 : Video-based  Role  Playing 
Application 
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Changing  any  of  these  characteristics 
presented  the  instructor  with  immediate 
visual  feedback.  The  selected  patient  would 
get  immediately  thinner  or  heavier,  younger 
or  older,  based  upon  the  associated  settings 
that  were  selected  via  an  intuitive  Graphical 
User  Interface  (GUI).  The  back-end 
software  provided  over  900  different 
scenarios  based  upon  the  combinations  of 
settings  that  the  instructor  selected.  Both 
videos  and  text  were  used  to  display  the 
selected  patient’s  “responses"  to  the 
Doctors  examination.  This  tool  provided  an 
interactive  approach  to  getting  Doctors  to 
engage  in  discussions  about  potential 
treatment  plans  for  the  presented  patient.  It 
also  afforded  the  instructor  a wide  array  of 
patient  profiles. 

The  solution  utilized  live  actors  to  represent 
each  patient.  Over  a period  of  several 
weeks,  make  up  artists  made  each  patient 
“older"  and  “fatter."  At  each  stage,  the 
patient  was  photographed  and  video  tapped 
both  sitting  on  the  examination  table  and 
responding  to  questions. 

And  while  the  end  product  was  visually 
appealing,  the  implementation  does  have  its 
drawbacks.  Any  desired  additions  or 
changes  to  the  scenarios  would  require  the 
actors,  make-up  artists  and  videographers 
to  once  again  be  brought  together  and 
create  the  desired  footage.  If  an  actor 
wasn’t  available,  then  this  could  negatively 
affect  the  product.  These  types  of  changes 
are  costly  in  both  time  and  dollars. 


5.0  TECHNOLOGY  TO  THE  RESCUE 

According  to  Margaret  Kaeter  [10],  today’s 
soft  skills  programs  are  more  like  arcade 
games.  Advances  in  both  Commercial  Off- 
The-Shelf  (COTS)  gaming  technology,  as 
well  as  improvements  in  3D  modeling  and 
character  animation,  make  these  viable 
alternatives  to  using  live  actors.  3-D  worlds 
similar  to  the  popular  Second  Life  also  have 
potential  for  teaching  soft  skills,  but  are 
beyond  the  scope  of  this  paper. 


The  goal  is  to  merge  the  engaging  nature  of 
the  video  game  and  the  experiential  nature 
of  this  medium  to  teach  soft  skills.  A 
blended  Interactive  Multimedia  Instruction 
(IMI)  approach  to  this  training  may  be  the 
right  direction.  The  ideal  solution  might 
utilize  a combination  of  media,  such  as 
slides  and  videos,  to  teach  the  basic 
concepts  for  the  utilization  of  necessary  soft 
skills,  and  gaming  technology  to  provide  the 
scenarios.  The  remainder  of  this  paper  will 
focus  on  the  COTS  gaming  application  for 
the  visualization  of  these  training  scenarios. 


6.0  VIRTUAL  VIGNETTE 

The  concept  for  the  Virtual  Vignette  is  that 
of  a role-playing  game.  The  person  playing 
the  game  (a  trainee)  is  placed  into  a 
scenario  where  they  must  interact  with  the 
game  character  that  is  a potential  suicide 
risk.  At  the  start  of  the  game,  the  trainee 
gets  to  select  their  character,  and  alter 
certain  elements  for  the  scenario. 


FIGURE  2:  Virtual  Vignette  Character 
Properties  Screen 

These  elements  might  include  the 
characters  stress  level,  age,  suicide  risk, 
tours  of  duty,  etc.  All  of  these  elements 
would  have  corresponding  questions, 
responses,  actions,  animations  and 
consequences.  These  items  would  also 
have  a time  element  associated  with  them 
such  that  any  interaction  delays  could 
escalate  the  consequences  and  vastly 
change  the  outcome  of  the  game. 
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The  virtual  Vignette  will  also  provide  an 
After  Action  Review  (AAR)  so  that  the 
trainee  can  get  feedback  as  to  what  they  did 
right,  or  what  they  did  wrong,  and  why.  The 
AAR  will  also  detail  sequence  errors  within 
the  Virtual  Vignette. 


6.1  Look  And  Feel 

In  order  for  game-based  training  to  be 
effective,  it  must  look  real,  feel  real  and 
sound  real.  Today’s  gaming  technology 
brings  together  the  confluence  of 
technologies  necessary  to  make  an 
effective  training  scenario  for  suicide 
prevention.  The  first  of  these  technologies  is 
the  realistic  3D  modeling  of  characters  and 
environments.  The  characters  need  to  look 
and  move  like  real  people.  The  avatars  in 
virtual  worlds,  such  as  Second  Life,  have 
the  level  of  detail  or  realism  that  today’s 
young  gamers  have  come  to  demand.  If  it 
doesn’t  look  real,  it  will  not  be  as  engaging. 

The  3D  characters  and  their  environment 
also  need  to  be  relevant.  If  a person  is 
attempting  to  teach  soldiers  about  suicide 
prevention,  then  the  characters  need  to  be 
in  the  appropriate  uniform  or  dress,  and 
should  also  be  germane  to  the  theater  of 
operation  that  the  trainee  finds  themselves 
in.  A soldier  about  to  be  deployed  to 
Afghanistan  should  be  presented  characters 
in  the  appropriate  desert  camouflage,  with 
terrain  and  quarters  that  are  representative 
of  the  locale.  Anything  else  will  detract  from 
the  effectiveness  of  the  training  and  weaken 
the  overall  experience. 


FIGURE  3:  Virtual  Vignette  Character 
Interaction  Screen  (Role  Play) 

The  animations  and  movements  of  the  3D 
characters  need  special  attention  as  well.  A 
character  that  represents  a potential  suicide 
risk  needs  to  display  the  appropriate  body 
language  and  gestures  such  that  the 
scenarios  feel  real.  Subject  Matter  Experts 
(SMEs)  must  work  closely  with  3D  graphic 
artists  and  animators  to  get  all  of  the 
character’s  idiosyncrasies  and  body 
movement  correct.  If  one  looks  at  top  selling 
video  games,  a lot  of  detail  goes  into  the 
realism  of  the  3D  character  movements.  If 
quality  is  lacking  in  this  area,  the  focus  will 
be  taken  away  from  the  realism  of  the 
training.  A person  only  needs  to  have 
watched  a movie  with  poor  acting  to 
understand  how  a bad  actor  will  take  the 
focus  away  from  the  story. 

Sound  is  often  over-looked,  but  it  is  another 
key  component  to  the  realism  of  the  virtual 
vignette.  3D  Characters  need  to  speak,  and 
their  vocal  inflections  need  to  convey  the 
emotions  appropriate  for  the  moment. 
Background  noise  is  also  important  for  the 
realism  of  the  scenario.  If  the  characters  are 
standing  near  a truck,  a player  should  hear 
the  engine  idle.  If  a door  opens,  he  or  she 
should  hear  it  open  and  then  slam  shut. 

Details,  even  minute  ones,  are  paramount 
to  creating  a realistic  scenario.  Examples 
include,  but  are  not  limited  to,  the  lighting, 
items  on  the  floor,  dust  blowing,  etc.  Though 
they  are  infinitely  easier  to  render  in  3D 
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barren,  objects  such  as  desks  are  not 
typically  clean,  flat  surfaces  in  real  life,  and 
they  shouldn't  be  in  a virtual  one.  These 
details  add  to  the  realism  and  help  sell  the 
scenario.  These  items  could  even  include 
tools  that  one  might  need  the  player  to 
interact  with  such  as  a telephone  (to  call  for 
help),  a pencil  and  paper  (to  write  notes),  or 
even  a gun  held  by  the  soldier  considering 
suicide. 


6.2  Consequences 

A good  game  needs  to  have  consequences 
both  positive  and  negative.  These  outcomes 
are  driven  by  the  choices  made  by  the 
trainee.  The  outcomes  should  include 
positive  and  negative  extremes,  with  various 
results  in  between.  In  the  case  of  a suicide 
prevention  vignette,  the  obvious  negative 
outcome  is  that  the  soldier  harms  himself  or 
another  character.  The  positive  outcome 
would  be  that  the  soldier  agrees  to 
accompany  the  trainee  to  seek  help.  SME’s 
would  work  with  the  game  scenario 
designers  to  detail  alternative  outcomes 
between  these  extremes. 


6.3  Interaction 

The  trainee  playing  the  game  needs  to  have 
some  degree  of  freedom  to  move  about  the 
virtual  environment.  Keep  in  mind  that  not 
everyone  that  will  use  this  training  will  be  a 
“gamer.”  so  there  should  be  limits  in  terms 
of  keyboard  controls  or  keys  to  invoke 
specialized  moves. 

In  order  for  a game  to  be  engaging,  it  needs 
to  challenge  the  player.  A simple  way  to 
achieve  this  is  to  vary  the  scenario  slightly 
each  time.  In  a role  playing  game  such  as 
this,  the  order  in  which  the  trainee  asks  the 
questions  of  the  3D  character  may  impact 
the  outcome.  Based  upon  reactions  from 
the  avatar,  certain  questions  could  become 
unavailable,  or  new  questions  added. 


The  starting  positions  of  the  characters  can 
also  impact  the  overall  game  play  and 
scenario  randomness.  For  example,  if  the 
scenario  is  set  in  a barracks,  the  character 
can  be  placed  either  close  to  the  trainee  or 
further  away.  As  the  player  starts  to 
approach  the  avatar,  a collision  boundary 
could  detect  the  proximity  and  trigger  an 
adverse  reaction  from  the  character. 

The  role  play  interaction  provides  an 
excellent  mechanism  to  focus  on  behavioral 
indicators  and  queues,  helping  to  make  the 
concepts  more  concrete.  It  will  also  help  the 
participants  gauge  their  own  skill  levels  and 
provide  them  with  ways  they  can  improve 
their  soft  skills. 


6.4  Programming 

The  complexity  of  the  Virtual  Vignette 
should  not  be  understated.  There  is  a lot  of 
complicated  programming  and  logic  behind 
a system  of  this  nature.  However,  once  the 
base  structure  has  been  built,  the  potential 
for  soft  skills  training  is  virtually  limitless. 

A core  component  for  the  Virtual  Vignette  is 
the  scenario  repository.  This  repository  will 
contain  all  of  the  questions,  responses, 
expert  assessments,  environments,  sounds, 
character  files  and  associated  animations. 
The  technical  aspects  of  how  all  these 
components  are  constructed  and  integrated 
is  beyond  the  scope  of  this  paper. 

However,  it  should  be  noted  that  the  Virtual 
Vignette  will  have  a modular  architecture, 
work  within  a standard  web  browser,  and  be 
SCORM  conformant. 

In  lieu  of  live  coaches,  computer-generated 
feedback  on  responses  to  each  scenario  will 
serve  as  a self-check  for  the  participants  in 
the  absence  of  an  instructor. 


7.  ALTERNATIVE  APPLICATIONS 

While  the  initial  application  for  the  Virtual 
Vignette  has  been  targeted  to  teach  suicide 
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prevention,  it  has  the  potential  to  expand  to 
other  training  initiatives  for  either  soft  or 
hard  skiiis.  Culturai  awareness  and 
sensitivity  would  be  a logical  application  for 
this  technology,  where  the  avatar  could  be  a 
village  elder  or  even  a suspected  insurgent. 
This  framework  could  be  applied  to  the 
Army  Combat  Lifesaver  Course,  where  the 
player  would  need  to  approach,  assess, 
treat  and  successfully  evacuate  a wounded 
comrade. 


8.  CONCLUSION 

Soft  skills  training  will  continue  to  be 
important  for  employees  in  all  types  of 
professions.  The  military  is  no  exception. 
The  type  of  computer-based  training  that 
has  been  discussed  in  this  paper  alleviates 
the  discomfort  of  role-playing  exercises. 
Students  can  go  through  training  at  their 
own  pace,  and  take  responsibility  for  their 
own  development.  Inevitably,  this  leads  to 
better  overall  soft  skills  training  results. 
Games  that  leverage  3D  modeling  and 
simulation  are  excellent  for  delivering 
realistic  role  play  scenarios. 

Regardless  of  the  technology,  it  is  important 
to  recognize  that  solid  content  is  at  the  core 
of  quality  soft  skills  training.  Any  effective 
training  for  soft  skills  must  use  a sound 
development  approach  and  identify  learning 
objectives.  Ultimately,  soft  skills  training  is 
only  as  good  as  its  content. 
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Introduction 


• Soft  skills:  more  than  just  tangible  facts 

• Difficult  to  do  “soft  skill”  development  In  non- 
traditional,  non-face-to-face  settings 

• Utilize  gaming  technology  to  teach  soft  skills 

• A major  component  of  the  Suicide  Prevention 
Program  training  focuses  on  "soft  skills." 
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Soft  Skills 

• Core  competencies 

- Leading  Change  - being  Results  Driven 

- Leading  People  - Building  Coalitions 

• Subsets  of  soft  skills  relevant  for  Suicide 
Prevention  training 

- Flexibility 

- Conflict  Management 

- Decisiveness 

- Problem  Solving 

- Influencing/Negotiating 
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Challenges 

• Not  to  limit  user  interactivity 

• How  do  we  make  abstract  concepts  concrete 
and  tangible 

• Provide  reievant  scenarios 

• Keeping  the  students  interested 

• Provide  expert  and  immediate  feedback 
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Role  Playing 

• Three  forms: 

- Role  plays 

- Video  replays 

- Case  discussions 

• Role  playing  allows  for  more  complex  and 
hypothetical  scenarios 

• No  substitute  for  realistic  “muscle  memory” 
activity 
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Technology  to  the  Rescue 

• Today’s  soft  skills  programs  are  more  like  arcade 
games 

• Viable  alternatives  to  using  live  actors 

- COTS  gaming  technology 

- 3D  modeling  and  character  animation, 

• Goai:  merge  the  engaging  nature  of  video 
games  and  the  experiential  nature  of  this 
medium 

• Use  a blended  IMI  approach 
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Virtual  Vignette 

• Easy 

- to  populate  with  scenarios/modify  existing  scenarios 

- to  update 

- to  re-use 

- to  deploy 

• Engage 

- interactive  3D  game-like  environment  to  role-play 

• Evaluate 

- After-action  Review  recaps  what  was  done  correctly, 
incorrectly  or  out-of-sequence. 

• Educate 

- Repository  of  relevant  videos  and  supporting  materials  (i.e. 
documents,  presentations,  and  images) 
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Virtual  Vignette 
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Virtual  Vignette 


J Questions 
II  How  do  you  feoP 


Are  you  h^veing  pioblema 
wWi  your  retetiooshipfi? 
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Virtual  Vignette 

• Look  And  Feel 

- 3D  characters  and  environment  need  to  be  relevant 

- Animations  and  movements 

- Realistic  sound 

• Consequences 

- both  positive  and  negative 

- detail  alternative  outcomes  between  extremes 

• Interaction 

- needs  to  challenge  the  player 

- not  everyone  is  a gamer 
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Alternative  Applications 

• Cultural  awareness  and  sensitivity 

• Army  Combat  Lifesaver  Course 

• Leadership  Training 

• Sexual  Harassment  Prevention 

• Only  limited  by  our  imagination 
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Conclusion 


• Soft  skills  training  will  continue  to  be  important 

• Games  that  leverage  3D  modeling  and 
simulation  are  excellent  for  realistic  role  play 

• Computer-based  training  alleviates  the 
discomfort  of  role-playing  exercises 

• Students  can  go  through  training  at  their  own 
pace 

- Take  responsibility  for  their  own  development. 

• Training  is  only  as  good  as  its  content 
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7.4  Defining  and  Leveraging  Game  Qualities  for  Serious  Games 


Defining  and  Leveraging  Game  Quaiities  for  Serious  Games 

Michael  W.  Martin  & Yuzhong  Shen 
Old  Dominion  University 

Department  of  Modeling,  Simulation,  and  Visualization  Engineering 
mmart081(d>odu.edu  vshen(8>odu.edu 

Abstract.  Serious  games  can  and  should  leverage  the  unique  qualities  of  video  games  to  effectively  deliver  educational  experiences 
for  the  learners.  However,  leveraging  these  qualities  is  incumbent  upon  understanding  what  these  unique  'game'  qualities  are,  and 
how  they  can  facilitate  the  learning  process.  This  paper  presents  an  examination  of  the  meaning  of  the  term  'game',  as  it  applies  to 
both  serious  games  and  digital  entertainment  games.  Through  the  examination  of  counter  examples,  we  derive  three  game 
characteristics;  games  are  self  contained,  provide  a variety  of  meaningful  choices,  and  are  intrinsically  compelling.  We  also  discuss 
the  theoretical  educational  foundations  which  support  the  application  of  these  'game  qualities'  to  educational  endeavors.  This  paper 
concludes  with  a presentation  of  results  achieved  through  the  application  of  these  qualities  and  the  applicable  educational  theories 
to  teach  learners  about  the  periodic  table  of  elements  via  a serious  game  developed  by  the  authors. 


1.0  INTRODUCTION 

The  term  “serious  games”  is  somewhat 
open-ended,  and  even  people  who  work 
with  serious  games  have  a hard  time 
agreeing  upon  its  exact  meaning.  This  work 
is  not  presented  with  the  intent  of  settling 
the  debate  on  the  meaning  of  the  term,  but 
simply  as  an  effort  to  add  to  the  greater 
body  of  discussion.  Additionally,  this  work 
is  presented  because  the  authors  believe 
that  this  approach  to  looking  at  serious 
games  can  be  useful,  not  only  in  the 
academic  sense  of  defining  the  term,  but  in 
actual  development  of  serious  games,  as 
well. 

To  that  end,  this  paper  will  first  present  a 
discussion  of  both  common  definitions  of 
“games”,  and  “serious  games”,  followed  by 
an  effort  to  identify  some  of  their  salient 
characteristics  to  clarify  the  definitions. 

Then,  the  paper  will  discuss  the 
development  and  deployment  of  a serious 
game  that  was  created  by  the  authors, 
based  on  these  concepts. 

2.0  BODY 

2.1  Definitions  for  Game  and  Video 
Game 

Mirriam- Webster's  online  dictionary 
provides  several  definitions  of  the  term 


“Game”  which  are  relevant  to  the 
discussion; 

“3a(1) : a physical  or  mental  competition 
conducted  according  to  rules  with  the 
participants  in  direct  opposition  to  each 
other. . ,3a(5) : the  manner  of  playing  in  a 
contest. . . 3c(2)  : any  activity  undertaken  or 
regarded  as  a contest  involving  rivalry,  strategy, 
or  struggle”  m. 

There  are  a number  of  more  in  depth  works 
on  the  subject  of  games,  and  they  all  tend 
towards  these  same  basic  definitional 
components.  In  the  1961  book  Man.  Play 
and  Game.  Roger  Caillois,  described  games 
as  being  activities  that  are  fun,  distinct, 
uncertain,  non-productive,  rule-driven,  and 
fictitious  [2].  Clark  Abt,  in  1970,  described 
games  as  an  “activity  among  two  or  more 
independent  decision-makers  seeking  to 
achieve  their  objectives  in  some  limiting 
context”  [3].  More  recently,  Katie  Salen  and 
Eric  Zimmerman  describe  “artificial  conflict, 
defined  by  rules,  that  results  in  a 
quantifiable  outcome”  [4].  While  each 
definition  has  its  own  nuances,  in  general, 
they  fairly  closely  resemble  the  dictionary 
definitions  found  above.  Those  common 
recurrent  components  found  in  these 
definitions  can  be  summed  up  as 
participants,  goals,  rules,  and  challenges. 
The  participants  have  goals  within  the 
game,  which  they  try  to  achieve  via  a set  of 
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rules.  The  rules  define  the  participants’ 
interactions,  and  in  the  application  of  these 
rules,  the  participants  try  to  overcome 
challenges  in  order  to  achieve  their  goals. 

The  definitions  apply  to  the  general  term 
game.  However,  the  focus  of  this  paper  will 
be  on  computer  based  games.  This  is 
simply  the  stipulation  that  the  game  in 
question  takes  part  on  a computer.  Michael 
Zyda,  Director  of  the  University  of  Southern 
California  Viterbi  School  of  Engineering’s 
GamePipe  Laboratory  builds  a similar 
dictionary  based  definition  proposing  that  a 
video  game  is; 

“A  mental  contest,  played  with  a computer 
according  to  certain  rules  for  amusement, 
recreation,  or  winning  a stake.  ” [5]  (emphasis 
added) 

2.2  Definitions  for  Serious  Game 
and  Definitional  Pitfalls 

Having  established  a basic  definition  for 
“game”,  the  next  step  is  to  determine  the 
meaning  of  “serious  games”. 

In  understanding  the  nature  of  a Serious 
Game,  as  defined  in  this  paper,  it  is  helpful 
to  consider  that  game  players  learn 
something  in  every  game.  If  this  were  not 
true,  then  a player’s  performance  on  a 
game  would  be  the  same  the  first  they 
played  as  the  last.  However,  this  is  not  so, 
and  players  improve  their  performance  by 
learning  the  mechanics  which  govern  the 
game. 

Most  of  the  time,  what  the  players  learn  is 
useless  in  the  real  world.  Players  of 
Nintendo’s  famous  game  Super  Mario  Bros. 
[6]  learn  that  mushrooms  are  evil,  but  that 
the  player  can  jump  on  them  to  kill  them. 
They  also  learn  that  they  can  jump  on 
turtles  and  use  their  shells  to  kill  other 
enemies,  like  the  mushrooms.  Learning 
these  aspects  of  the  game  helps  the  players 
perform  better,  but  this  knowledge  usually 
transfers  very  poorly  to  the  real  world. 

In  a Serious  Game,  the  knowledge  that  the 
user  learns  within  the  game  is  transferable 


to  the  real  world.  The  game  mechanics  and 
content  have  a sufficient  degree  of  fidelity 
with  real  life  mechanics  and  subject.  When 
the  player  learns  something  in  the  game, 
that  knowledge  is  transferable,  within 
reason,  to  the  real  world  as  well  as  the 
game. 

A number  of  established  serious  game 
authorities  define  serious  games  in  terms  of 
their  intent.  One  of  the  most  succinct 
definitions  for  serious  games  is  found  on  the 
Michigan  State  University  Serious  Game 
Design  Program  webpage: 

“Serious  games  are  games  with  purpose  beyond 
just  providing  entertainment."  [7] 

Put  another  way,  “Serious  Games”  are 
game  played  for  a serious  intent  or  reason. 
Zyda  proposes  a similar  definition,  though 
with  more  detail  as  to  the  nature  of  the 
purpose: 

“Serious  game:  a mental  contest,  played  with  a 
computer  in  accordance  with  speci^c  rules,  that 
uses  entertainment  to  further  government  or 
corporate  training,  education,  health,  public 
policy,  and  strategic  communication  objectives." 
[5] 

Building  upon  the  definitions  above,  a 
Serious  Game  might  be  “an  activity 
consisting  of  participants,  goals,  rules,  and 
challenges  with  a purpose  beyond 
entertainment," 

These  definitions  for  "Games”  and  “Serious 
Games”  are  useful  for  grounding  the 
baseline  understanding  of  these  concepts. 
However,  as  analytic  propositions,  they  may 
also  be  seen  as  being  overly  inclusive. 

They  encompass  the  concepts  in  widely 
applicable  terms,  and  as  a result,  may  also 
be  valid  for  things  which  are  not  games. 

For  example,  a game  of  football  has 
participants,  goals,  rules,  and  challenges. 
However,  arguably  the  chore  of  mowing  the 
lawn  has  all  of  these  components  as  well. 
This  does  not  necessarily  invalidate  the 
definition  as  much  as  it  highlights  that  this 
definition  does  not  provide  sufficient 
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information  to  distinguish  games  from  non- 
games. 

Normally,  a traditional  game  maker  will  not 
suffer  from  such  a definitional  dilemma. 
Arguably,  the  one  and  only  value  metric 
used  for  entertainment  games  is  their  "fun" 
or  entertainment  value.  Serious  games  add 
a competing  metric:  the  ulterior  purpose 
beyond  entertainment.  This  additional 
aspect  can  confound  the  value  assessment. 

With  these  competing  values,  the  field  of 
serious  games  can  become  quite  confusing. 
Organizations  of  all  sizes  and  types  attempt 
to  leverage  viable  serious  games  to  achieve 
their  goals.  Ambiguity  in  what  constitutes  a 
game  allows  the  term  to  be  applied  to  a 
variety  of  efforts  which  are  perhaps  better 
identified  as  non-games. 

As  with  the  “lawn  mowing"  example  above, 
the  serious  game  definition  might 
encompass  the  use  of  a budgeting  program. 
Budgeting  program  have  participants,  goals, 
rules,  challenges,  and  a purpose  beyond 
entertainment.  Such  a definition  might  be 
technically  correct,  but  ultimately  it  is 
unhelpful  in  trying  to  determine  how  to  best 
make  use  of  serious  games.  Clearly,  if 
someone  expected  a game,  and  was  given 
a budget  to  balance,  they  might  be 
dissatisfied.  Such  confusion  could 
contribute  to  disillusionment  on  the  part  of 
serious  game  customers,  users,  and 
developers. 

2.3  Characterization  of  Serious 
Games 

To  better  understand  and  apply  the  term 
“serious  games”  it  is  helpful  to  identify  what 
aspects  of  games  make  them  unique  from 
non-game  activities.  There  are  two  serious 
game  counter  examples  which  help  provide 
additional  clarity  through  contrast.  The  first 
counter  example  is  generic  computer  based 
training,  and  the  second  is  computerized 
sand  box  training. 

2.3.1  Computer  Based  Training 

There  are  a number  of  computer  based 
learning  or  training  application  which  have 


seized  upon  the  “Serious  Games”  trend. 
They  apply  game-like  facades  to  traditional 
computer  education  activities  and  declared 
them  to  be  games.  This  misappropriation  of 
the  term  "Serious  Game”  is  compounded  by 
the  fact  that,  as  with  the  examples  above, 
many  traditional  computer  based  learning 
activities  fulfill  the  definition  of  game 
provided  above. 

A quiz  or  test,  common  to  most  educational 
systems,  is  a prime  example  of  an  activity 
that  can  satisfy  the  majority  of  the  game 
attributes  listed  above.  A quiz  has  rules  (fill 
in  the  blank,  multiple  choice,  etc.),  it  has 
goals  (to  score  the  highest  possible  score) 
and  conflict  (the  difficulty  of  the  questions). 

It  is  a difficult  to  argue  that  quizzes  or  tests 
are  fun,  but  some  educational  applications 
decorate  test-like  experiences  with  “fun” 
veneers,  such  as  cute  graphics  and  silly 
sound  effects  and  label  them  serious 
games.  Two  examples  of  this  are  Grammar 
Gorillas  [8],  and  Snork’s  Long  Division  [9], 
Grammar  Gorillas  has  the  player  select 
specific  parts  from  a sentence.  Snork’s 
Long  Division  has  the  player  simply  perform 
long  division.  Both  are  self  declared  serious 
games,  but  clearly  fall  short  of  what  would 
normally  be  considered  a game. 

In  examining  why  these  activities  do  not 
seem  like  game,  one  finds  that  at  their  core, 
these  programs  don’t  provide  the  learner 
with  any  meaningful  choices.  The  user 
simply  answers  the  question  correctly  or 
fails  the  question.  While  the  students 
choose  how  to  answer  the  question,  these 
choices  are  not  meaningful  in  that  there  is 
only  one  correct  answer,  with  no  viable 
alternative.  Raph  Koster  identified  this 
pitfall  in  A Theory  of  Fun  and  Learning[10]. 
Using  Tic-Tac-Toe  as  an  example,  he 
illustrates  that  the  game  ceases  to  be  a 
game  when  the  players  learn  that  they  have 
no  choices.  At  that  point,  the  game 
becomes  a simple  drill  in  the  rote 
application  of  logic. 

Further,  by  not  providing  meaningful 
alternatives,  these  software  examples  limit 
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the  potential  for  creating  a psychosocial 
moratorium,  as  described  by  James  Paui 
Gee.  The  psychosocial  moratorium,  a 
phrase  which  Gee  adapts  from  psychologist 
Eric  Erikson,  is  “a  learning  space  in  which 
the  learner  can  take  risks  where  real  world 
consequences  are  lowered,”  [11],  The  lack 
of  choice  collapses  the  exploratory  learning 
space,  and  deprives  the  player  of  the 
opportunity  to  reflect  upon  the  information 
being  presented  and  applied,  David  Shaffer 
argues  that  games  have  the  potential  to 
allow  a more  "authentic"  method  of  learning 
than  traditional  schooling  techniques 
because  games  set  the  stage  for  learners  to 
think  not  only  about  what  the  right  answer 
is,  but  also  how  they  know  an  answer  is 
right  and  what  is  the  process  by  which  they 
arrive  at  that  answer  [12],  Quizzes,  which 
iimit  the  “piayers”  interaction  with  their  world 
(the  quiz)  to  a singie  correct  answer,  with  aii 
other  options  being  incorrect,  do  not 
engender  the  same  degree  of  introspection 
that  a wider  array  of  viable  choices  might. 

in  order  to  provide  that  sense  of  freedom 
that  fosters  exploration  and,  as  Gee  and 
Shaffer  suggest,  learning,  games  need  to 
adhere,  to  an  extent,  to  game  designer  Sid 
Meier’s  dictum  that  games  are  a “series  of 
interesting  decisions”  [13],  Meier  is 
renowned  for  his  complex  and  involved 
strategy  games,  like  the  Civilization  and 
Tycoon  series,  where  each  decision  can 
have  profound  consequences.  However, 
games  can  provide  the  player  with 
meaningful  choices  without  having  to  resort 
to  such  eiaborate  depth.  The  term 
meaningful,  in  this  sense,  can  refer  not  to  a 
profound  consequence  for  the  available 
choices,  but  rather,  to  having  any 
consequence  at  all. 

Some  activities,  like  the  above  mentioned 
quizzes,  offer  no  meaningful  choices  - the 
player’s  only  option  is  to  answer  the 
question  correctly.  At  the  other  extreme, 
some  games  offer  a wide  array  of  choices 
that  effectively  have  no  consequence.  For 
example,  many  games  allow  the  player  to 
visually  customize  their  character.  A player 


can  spend  hours  adjusting  the  eye  coior, 
cheek  bone  height,  hair  style,  etcetera,  but 
ultimately  none  of  that  has  any  effect  on  the 
game  play.  In  contrast,  a meaningful  choice 
allows  the  player  to  select  between  viable 
alternatives  with  concrete  consequences, 
without  any  dear  optimai  choice,  thereby 
allowing  the  player  to  freely  explore  the 
conceptual  space  created  by  the  game. 

In  order  to  differentiate  games  from  quizzes, 
the  first  necessary  characteristic  of  games 
(and  serious  games)  is  that  they  provide  the 
user  with  an  array  of  meaningful  choices, 

2.3.2  Sandbox  Experiences 

The  next  counter  example  of  non-games  is 
the  open  sandbox  experience.  These  types 
of  programs  are  commonly  used  in  military 
and  poiice  training.  They  are  programs  that 
create  worids  in  which  typicaiiy  iarge  groups 
of  individuals  engage  in  educational  or 
learning  scenarios. 

Such  programs  have  been  in  use  for  several 
decades.  One  of  the  first  was  SIM  NET, 
which  was  developed  and  deployed  in  the 
early  to  mid  1980’s  [14],  For  many  years, 
such  programs  have  used  specialized 
hardware  and  software,  which  was 
inexpensive  in  comparison  to  the  resource 
cost  of  conducting  the  training  using  reai 
wo  rid  equipment  and  locations. 

Recentiy,  however,  the  commercial 
entertainment  software  industry  has  proven 
that  high  quaiity  experiences  can  be 
deiivered  on  commerciaily  avaiiabie 
hardware  and  software.  And  these 
experiences  can  be  delivered  for  even  less 
than  the  cost  of  simulation  using  specialized 
hardware  and  software. 

One  recent  development  in  this  field  is  the 
US  Army’s  adoption  of  a program  called 
Virtual  Battlespace  2 (VBS2)  [15],  VBS2  is 
developed  by  Bohemia  Interactive,  which 
was  previously  a commercial  entertainment 
company.  Bohemia  developed  a great 
number  of  games  including  a tactical  virtual 
reality  shooting  game  called  Operation 
Flashpoint.  Unlike  earlier  training 
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simulations  using  specialized  software  and 
hardware,  VBS2  is  based  on  the  Operation 
Flashpoint  game,  and  is  designed  to  run  on 
common  desktops  and  laptops.  Because 
the  program  shares  software  technology 
with  Bohemia  Interactive’s  new  game. 
Operation  Flashpoint  II,  there  is  a natural 
tendency  to  call  VBS2  a serious  game. 

However,  VBS2  is  not  a game  in  the 
traditional  sense.  It  is  designed  to  host 
large  numbers  of  networked  users  in  a 
virtual  environment.  In  game  terms,  it  would 
be  considered  a large  multiplayer  game. 
Unlike  games,  however,  this  experience  is 
not  a closed  system.  The  multiplayer 
sessions  don’t  have  scores,  or  objectives 
defined  within  the  game.  They  lack  the 
framework  associated  with  even  freeform 
multiplayer  games.  Instead,  VBS2  sessions 
are  designed  to  be  administered  by  teams 
of  instructors,  who  then  take  on  the  role  of 
giving  the  players  objectives,  assessing 
their  performance  and  providing  them  with 
feedback.  In  commercial  terms,  this  would 
be  considered  an  open  world  experience, 
similar  to  Second  Life.  Players  can  interact, 
but  there  is  not  game  structure  to  support 
the  interactions. 

In  commercial  games,  a player  can  play  the 
game  entirely  by  himself  or  herself.  This 
concept  even  applies  to  multiplayer  games. 
A player  sitting  down  with  the  commercial 
version  of  the  game  Operation  Flashpoint 
can  play  the  game  by  themselves,  even 
when  engaging  in  multiplayer  sessions. 

They  can  join  a multiplayer  session  with  no 
prior  coordination,  play  the  game,  and  then 
quit  when  they  desire.  Most  importantly,  the 
game  provides  internal  feedback  loops,  via 
scores  and  other  performance  measures,  to 
let  the  player  know  how  well  they 
performed.  Granted,  in  games  which  are 
exclusively  multiplayer  in  nature,  this 
assumes  a robust  network  structure  with 
available  sessions,  but  given  that 
assumption,  there  is  no  overhead  to  playing 
the  game  other  than  the  player  and  the 
game  itself.  VBS2  and  other  serious  games 


like  it  lack  this  fundamental  game 
characteristic. 

If  VBS2  did  have  those  qualities,  then  users 
could  train  at  their  own  pace,  learning  the 
materials  as  appropriate  to  their  individual 
skills.  Targeting  training  at  the  individual 
level  could  greatly  increase  the  user 
engagement  and  ultimately  the 
effectiveness  of  the  training.  No  longer 
would  quick  learners  be  held  up  by  the  slow 
members  of  traditional  classes  or  training 
groups,  nor  would  the  slow  learners  be 
dragged  along  faster  than  they  can 
assimilate  the  material.  If  the  instructional 
framework  were  properly  embedded,  then 
the  serious  game  would  be  a self 
encapsulated  experience,  just  as 
commercial  video  games  are. 

Therefore,  the  second  quality  that  a serious 
game  should  have  is  that  it  is  a self 
encapsulated  experience.  Given  that  this 
discussion  has  been  centered  on  video 
games,  it  is  worth  noting  that  this  quality 
would  not  exclude  serious  game  that  are  not 
played  on  a computer.  Because  of  their 
automation  advantages,  computers  facilitate 
this  characteristic.  However,  it  is  possible  to 
have  board  or  card  and  paper  games,  or 
even  athletic  games  that  are  playable  solo 
as  well  as  with  other  people.  Granted, 
these  games  are  rare,  and  the  difficulty  in 
creating  them  is  much  higher  than 
traditional  board  games,  but  it  is  not 
impossible  to  conceive  of. 

2.4  The  Issue  of  Fun 

A few  of  the  definitions  of  games,  such 
those  of  Caillois  and  Zyda,  also  refer  to 
entertainment  or  fun.  Many  of  the 
definitions  omit  such  concepts,  perhaps  due 
to  their  highly  subjective  nature.  However, 
as  a basic  metric  of  value  for  games,  it  is 
undeniable  that  these  are  fundamental 
aspects  of  the  concept. 

Customers  pay  money  to  play  games  not 
because  the  games  provide  some  sort  of 
reward,  but  because  the  game  experience, 
itself,  is  rewarding.  The  intrinsic  value  of 
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the  game  experience  outweighs  any 
extrinsic  benefit  bestowed  by  playing  the 
game.  The  games,  without  any 
consideration  to  outside  benefit,  are 
intrinsicaiiy  compeiling. 

“intrinsicaiiy  Compelling”  encompasses  the 
concepts  of  “fun",  but  it  also  makes  room  for 
other  aspects,  such  as  the  satisfaction  of 
overcoming  a chaiienge,  or  earning  a 
reward,  which  may  not  be  entirely  fun.  For 
example,  many  Massively  Multiplayer 
Online  Role-Playing  Games  (MMORPGs), 
like  World  of  Warcraft  [16],  routinely  include 
what  player  communities  often  refer  to  as 
“grinding”.  World  of  Warcraft  is  a game  in 
which  players  try  to  improve  their  character 
by  accomplishing  various  tasks  and  thereby 
gaining  experience  points.  Grinding  is  a low 
risk  way  to  gain  experience  points.  It  is  a 
repetitive  act,  which  is  widely  described  in 
negative  terms,  often  involving  tediously 
killing  large  numbers  of  weaker  enemies 
which  pose  little  threat.  Killing  these  weak 
enemies  might  be  a boring  and  repetitive 
task,  but  it  gains  points  which  grant  the 
player  some  form  of  in-game  reward,  such 
as  a more  powerful  character.  It  is  a 
significant  component  of  this  game  genre 
which  is  not  normally  deemed  as  fun.  Yet  it 
is  intrinsically  compelling. 

Regardless,  however,  of  what  creates  that 
intrinsic  compulsion,  be  it  fun  or  the  feeling 
of  achievement,  or  some  other  factor,  this 
intrinsic  compulsion  and  educational  value 
do  not  necessarily  compete.  Raph  Koester 
proposes  that  fun  is,  in  fact,  the  brain’s 
reaction  to  learning  [10].  Keeping  in  mind 
that  paradigm  that  players  are  always 
learning  when  they  play,  it  follows  that  the 
two  are  closely  interrelated. 

Even  from  a practical  stand  point,  a player 
who  feels  intrinsically  compelled  by  a 
serious  game  is  more  likely  to  engage  with 
the  game,  and  therefore,  assuming  the 
serious  game  is  designed  well,  more  likely 
to  achieve  the  desired  “non-entertainment” 
purpose. 


Thus,  the  third  quality  of  a serious  game  is 
that  it  should  be  intrinsically  compelling. 

3.0  DISCUSSION 

Based  on  these  concepts,  the  authors  have 
been  developing  a Serious  Game  entitled 
Elemental  Solitaire™.  The  effort  to  develop 
the  game  began  simply  enough  with  the 
idea  to  build  a combination  of  popular 
classic  card  game  mechanics,  such  as 
solitaire,  and  the  periodic  table. 

Because  of  the  nature  of  this  project,  it  was 
more  likely  to  fall  prey  to  the  “Computer 
Based  Training”  pitfall  than  the  “Sandbox 
Experience”  pitfall.  The  mechanics  of  the 
game  lent  themselves  well  to  creating  an 
encapsulated  game-play  experience. 
However,  avoiding  the  “quiz-like”  danger 
required  more  effort. 


Fig  1 ; Elemental  Solitaire  Game  Screen 

In  early  iterations  of  the  game,  the  simplest 
designs  precluded  any  such  meaningful 
choice.  As  the  players  were  given 
elements,  they  either  placed  them  correctly, 
or  they  were  penalized  for  failing  to  do  so. 
Though  the  program  had  a graphic  interface 
and  the  elements  had  an  appearance  of 
playing  cards,  there  was  no  conceptual 
space  to  explore.  The  program  simply 
presented  the  user  with  quiz  questions 
disguised  in  a graphical  form.  Though 
different  in  execution,  this  program,  in  spirit, 
resembled  many  of  the  aforementioned 
online  quizzes.  In  order  to  add  more 
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decision  space  to  explore,  three  items  were 
added  to  provide  the  user  with  more 
meaningful  choices. 

First,  players  were  given  ten  ‘skips’,  so  if 
they  wanted  to  delay  placing  an  element, 
they  had  the  ability  to  do  so  ten  times, 
without  penalty.  This  gives  the  player  a 
small  degree  of  control  in  choosing  whether 
to  place  an  element  or  not.  It  also  adds  a 
measure  of  strategic  depth,  as  players  must 
ration  their  skip  choices,  and  forcing  them  to 
weigh  the  risk  of  skipping  a present  element 
versus  the  need  to  be  able  to  skip  an 
element  later  on. 

Second,  when  placing  an  element,  the 
players  are  given  a countdown  timer.  As 
the  time  passes,  hints  as  to  the  correct 
element  position  on  the  table  are 
automatically  given,  including  the  family 
color,  the  row,  and  ultimately  the  actual 
element  position.  The  balance  is  that  the 
score  the  player  receives  for  placing  a card 
decreases  as  more  hints  are  provided,  until, 
ultimately,  no  points  are  awarded  if  the 
card’s  correct  position  is  shown  to  the 
player.  A player  can  also  choose  to 
capitalize  on  this,  and  if  they  are  willing  to 
take  a lower  reward,  can  even  manually 
advance  the  timer  to  display  the  next  hint, 
without  having  to  wait.  Again,  this  mechanic 
allows  the  player  decide  how  to  balance  risk 
and  reward.  As  risk  diminishes,  so  does  the 
reward. 

Lastly,  as  mentioned  above,  an  abstract 
scoring  mechanic  was  added  to  the  game. 
This  system  rewards  the  player  with  a 
specified  amount  of  points  for  each  element 
correctly  placed,  and  deducts  an  amount  for 
incorrect  placements.  The  amounted  award 
for  correct  placement  is  inversely 
proportional  to  the  time  taken  to  place  the 
element.  Additionally,  not  only  do  players 
get  rewarded  for  correct  answers,  but  they 
also  increase  a score  multiplier,  which  links 
their  decisions  on  how  they  choose  to 
answer  questions  with  the  rewards  they  can 
receive  on  future  correct  answers.  As  they 
score  better,  their  multiplier  increases. 


allowing  them  to  score  even  higher  on 
subsequent  correct  placements.  This  factor 
also  adds  significance  to  the  decision  the 
player  must  make  in  balancing  the  risk  and 
reward  of  placing  the  elements. 

These  three  mechanics  of  the  game 
combine  to  create  a decision  space  for  the 
player  to  explore,  and  extend  the  game 
space  beyond  simply  entering  a right  or 
wrong  answer.  These  "choice  creating 
factors’’  do  not  possess  the  depth  nor  the 
scale  of  the  types  of  decision  often  made  in 
a highly  strategic  game,  such  as  Sid  Meier’s 
Civilization  games,  but  they  do  create  a 
small  space  for  the  learner  to  explore. 

These  factors  set  the  conditions  for  the 
psychosocial  moratorium,  and  presumably 
improve  the  facilitation  of  learning.  With  an 
array  of  possible  actions,  the  player  can 
play  how  they  like,  and,  in  the  words  of 
game  designer  Chris  Crawford,  imprint  their 
own  personality  on  the  game  [17].  Fig.  2 
shows  a screenshot  from  a game  in  play, 
with  a hint  showing  the  family  and  row  of  the 
element. 


Fig  2.  Game  Screen  with  Hints 

Additional  discussion  of  the  development  of 
Elemental  Solitaire™  can  be  found  in  the 
paper  Differentiating  Between  Serious 
Games  and  Computer  Aided  Instruction 
[18]. 
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4.0  CONCLUSION 

While  the  traditional  approach  to 
understanding  something  is  to  examine 
what  it  is,  this  paper  presents  a 
characterization  of  Serious  Games  based 
on  what  they  are  not.  From  this 
examination  of  counter-examples,  we  have 
derived  three  qualities  which  a Serious 
Game  should  possess: 

1 . They  provide  meaningful  choices  to 
the  user. 

2.  They  are  self  encapsulated 
experiences. 

3.  They  are  intrinsically  compelling. 

Elemental  Soliatire™  is  an  example  of  how 
these  characteristics  can  be  used  to  guide 
the  development  of  a Serious  Game.  This 
program  is  going  to  be  used  as  a test 
platform  to  assess  the  effectiveness  of 
these  and  other  game  design  principles  in 
developing  Serious  Games. 
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Abstract.  3D  road  models  are  widely  used  in  many  computer  applications  such  as  racing  games  and  driving  simulations.  However, 
almost  all  high-fidelity  3D  road  models  were  generated  manually  by  professional  artists  at  the  expense  of  intensive  labor.  There  are 
very  few  existing  methods  for  automatically  generating  3D  high-fidelity  road  networks,  especially  those  existing  In  the  real  world. 
This  paper  presents  a novel  approach  that  can  automatically  produce  3D  high-fidelity  road  network  models  from  real  2D  road  GIS 
data  that  mainly  contain  road  centerline  information.  The  proposed  method  first  builds  parametric  representations  of  the  road 
centerlines  through  segmentation  and  fitting.  A basic  set  of  civil  engineering  rules  (e  g.,  cross  slope,  superelevation,  grade)  for  road 
design  are  then  selected  in  order  to  generate  realistic  road  surfaces  in  compliance  with  these  rules.  While  the  proposed  method 
applies  to  any  types  of  roads,  this  paper  mainly  addresses  automatic  generation  of  complex  traffic  interchanges  and  intersections, 
which  are  the  most  sophisticated  elements  in  the  road  networks. 


1.0  INTRODUCTION 

Road  networks  are  critical  infrastructures  of 
human  civilization  and  probably  the  most 
important  means  of  transportation.  With 
advances  in  computing  technologies,  2D 
and  3D  road  models  have  been  employed  in 
many  applications,  such  as  computer 
games  and  virtual  environment  construction. 
Roads  are  complex  3D  structures  and 
traditional  road  models  were  generated  by 
professional  artists  manually  using  modeling 
software  tools  such  as  Maya  and  3ds  Max. 
This  approach  requires  both  highly 
specialized  and  sophisticated  skills  and 
massive  manual  labor.  Procedural 
modeling  based  automatic  road  generation 
methods  [1-6]  create  road  models  using 
specially  designed  computer  algorithms  or 
procedures  and  they  can  dramatically 
reduce  or  eliminate  the  amount  of  manual 
editing  needed  for  road  modeling.  However, 
most  existing  procedural  road  modeling 
methods  aimed  at  the  visual  effects  of  the 
generated  roads,  not  the  geometric  or 
architectural  fidelity  that  mostly  determines 
the  driving  experience. 

Geographic  information  systems  (GIS)  are 
computer-based  systems  widely  used  to 
store,  manipulate,  display,  and  analyze 
geographic  information  in  many  fields.  With 
the  rapid  development  and  widespread  use 
of  GIS  techniques,  vast  GIS  data  are 
captured  through  various  digital  data 
collection  methods  such  as  remote  sensing 
via  cameras,  digital  scanners  and  LIDAR. 


As  one  kind  of  them,  road  GIS  data  which 
record  the  information  of  the  real  road 
network  in  the  best  possible  way  have  also 
been  used  in  many  applications  and 
facilitate  our  lives  greatly,  e.g.,  the 
automotive  navigation  system.  Since  real 
road  GIS  data  contain  road  network 
information  which  is  indispensable  for  some 
applications,  especially  for  transportation, 
homeland  security  and  defense  applications, 
it  will  significantly  reduce  both  the  time  and 
lab  cost  if  3D  road  network  can  be  modeled 
from  road  GIS  data  directly  and 
automatically.  However,  there  are  very  few 
existing  methods  for  GIS  based  automatic 
3D  high-fidelity  road  networks  generation. 
Most  GIS  based  modeling  work  and 
software  focus  on  buildings  [7,  8], 
vegetation  and  rural  landscape  visualization 
[9-11]  rather  than  roads. 

Therefore,  a method  that  can  automatically 
produce  3D  high-fidelity  road  network 
models  from  real  road  GIS  data  will  greatly 
benefit  numerous  applications  involving 
road  networks.  This  paper  addresses  this 
problem  by  proposing  a novel  method  which 
is  used  in  an  ongoing  project  to 
automatically  generate  3D  high-fidelity  road 
network  models  from  existing  road  GIS  data 
in  compliance  with  a set  of  selected  civil 
engineering  rules.  The  proposed  method 
consists  of  several  steps,  including  road 
GIS  data  preprocessing,  road 
representation  parameterization,  civil 
engineering  rules  based  road  surface 
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modeling,  and  intersection  and  interchange 
generation. 

This  paper  is  organized  as  follows:  section 
2 describes  each  step  of  the  proposed 
method  in  detail;  section  3 discusses  the 
application  fields  and  advantages  of  the 
proposed  method,  and  finally  this  paper  is 
concluded  in  section  4. 

2.0  BODY 

In  this  section,  the  whole  road  network 
modeling  method  is  described  using  a road 
network  which  contains  several  road 
segments,  road  intersections  and  a traffic 
interchange  as  shown  in  Figure  1(a)  as  an 
example. 

2.1  Road  GIS  Data  Preprocessing 

2.1.1  GIS  Data  Import 

Although  various  data  formats  have  been 
developed  for  GIS  applications,  such  as 
GML  (Geographic  Markup  Language)  and 
TIGER  (reference  file  format),  shapefile 
format  is  the  most  widely  used  format  and  it 
is  utilized  in  this  research.  A typical 
shapefile  consists  of  a main  file,  an  index 
file,  a dBase  file  and  a projection  file. 

Among  them,  the  first  three  files  define  the 
geometry  and  attributes  in  a shapefile. 
Three  kinds  of  shape  types  are  used  to 
represent  geometric  shape  features,  which 
are  point,  polyline  and  polygon  [12].  The 
main  file  contains  the  primary  reference 
data  with  one  record  per  shape  feature;  the 
index  file  stores  the  position  and  content 
length  for  each  record  in  the  main  file;  the 
dBase  file  contains  the  feature  attribute  for 
each  record  in  the  main  file.  A library 
named  shapelib  [13]  is  used  in  this  project 
to  read  data  from  shapefile, 

2.1.2  Road  Network  Topology 
Extraction 

Since  in  existing  road  GIS  data,  roads  are 
represented  as  2D  (latitude  and  longitude) 
centerlines  in  the  form  of  polylines,  i.e., 
connected  line  segments  that  consist  of 
consecutive  but  discrete  road  centerline 
points  as  shown  in  Figure  1(b),  road 


network  topology  or  connectivity  information 
is  not  explicitly  represented,  that  is,  it  is 
difficult  to  determine  if  two  polylines  are 
connected  without  explicitly  comparing  the 
points  that  compose  the  polylines.  To 
expedite  the  automatic  road  generation 
process  and  to  facilitate  road  navigation,  an 
explicit  representation  of  the  road  network 
or  topology  is  necessary.  In  this  research,  a 
method  proposed  in  [14]  was  employed  to 
extract  the  topology  information  of  the  road 
network  from  raw  road  GIS  data.  The 
output  of  this  method  is  a road  network 
represented  as  a graph  composed  by  road 
intersections  (nodes)  and  road  polylines 
(links)  as  shown  in  Figure  1(c). 

2.1.3  Road  Network  Simplification 

The  raw  road  GIS  data  contain  redundant 
representations  for  road  links  that  have 
multiple  names.  That  is,  the  same  road  with 
two  or  more  different  names  is  stored  as 
two  or  more  independent  roads  in  the  raw 
road  GIS  data.  While  this  redundancy  might 
be  useful  for  other  purposes,  the  same 
physical  road  should  have  only  one  3D 
representation  in  this  project.  Hence  road 
links  with  the  same  physical  positions  but 
different  names  are  combined  into  one  road 
link  with  multiple  names,  eliminating  the 
redundancy  of  the  network  representation. 

2.1.4  Road  Classification 

In  order  to  obtain  more  information  about 
road  network,  road  links  are  classified  into 
different  categories  according  to  their 
names  which  usually  can  be  obtained  from 
the  road  GIS  data.  Some  keywords  are 
identified  to  do  the  classification  and  totally 
six  road  categories  are  defined  in  this 
research,  which  are  highway,  local,  ramp, 
bridge,  tunnel,  and  unknown.  For  instance, 
keywords  used  to  indentify  local  roads  are  a 
set  including  "LN",  "ST",  "RD",  "DR",  "AVE", 
"BLVD"  and  "PKWY".  Keywords  used  for 
the  road  classification  in  this  project  are 
listed  as  follows: 

• Highway:"!-"  (e.g.  I-64),  "HWY". 

• Local:  " LN",  " ST",  " RD",  " DR",  "AVE", 
"BLVD",  "PKWY". 
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(a)  (b)  (c) 

Figure  1 A road  network  example  which  contains  several  road  segments,  road  intersections  and  a 
traffic  interchange,  (a)  Satellite  photo  around  VA-403  Norfolk,  VA  from  USGS.  (b)  Road  centerline 
and  discrete  road  centerline  points  from  road  GIS  data  shown  in  ArcGIS  for  the  same  area,  (c) 
Road  network  after  topology  extraction  with  extracted  nodes  indicated  by  red  points. 


. Ramp:  “RAMP",  0-9+A-Z  {e.g.  14B). 

• Bridge:  "BRIDGE". 

• Tunnel:  "TUNNEL". 

. Unknown:  "NULL". 

2.2  Parametric  Representation 

2.2.1  Parameterization 

As  mentioned  before,  the  road  GIS  data 
only  contain  discrete  points  of  the  road 
network  in  the  form  of  polylines.  This 
discrete  representation  has  several 
drawbacks,  with  the  most  serious  one  being 
not  supporting  arbitrary  resolution  of 
representations.  The  proposed  method 
converts  the  original  discrete  representation 
of  the  road  network  into  parametric 
representations  that  have  the  advantages  of 
supporting  arbitrary  resolution  of  levels  of 
details  (LOD)  and  reduced  memory  usage. 
Two  standard  parametric  forms  are  defined 
as  Straight  Lines  and  Circular  Curves. 

• Straight  Line  (Line):  A straight  line 
connecting  two  points  and  specified 
by  its  ID,  start  point,  end  point,  start 
side  vector  and  end  side  vector. 

• Circular  Curve  (Curve);  Part  of  a 
circle  and  specified  by  its  ID, 
position  of  center  point,  radius,  start 
point,  end  point,  start  side  vector, 
end  side  vector,  start  angle,  end 


angle  and  direction  (clockwise  / 
anticlockwise). 

Among  these  parameters,  side  vectors 
indicate  the  extension  direction  of  road 
surface.  This  representation  of  road 
centerline  data  has  several  advantages. 
First  of  all,  it  is  relatively  simple  and  easy  to 
understand  and  implement.  More 
importantly,  it  is  well-suited  to  apply  civil 
engineering  principles,  especially  for 
superelevation  generation  of  curve  road  for 
which  the  center  point  and  radius  are 
required. 

2.2.2  Segmenting  and  Fitting 

In  order  to  divide  the  road  links  into 
segments  that  can  be  represented  in 
standard  segment  forms,  three  types  of 
critical  points  are  identified  to  segment  the 
road  polylines:  acute  turn,  s-turn  and  turn 
start/end  [15]  based  on  their  geometric 
features.  After  the  critical  points  are 
identified,  road  polyline  links  are  partitioned 
into  a set  of  segments  that  are  groups  of 
discrete  points  ready  for  segment  fitting  to 
obtain  their  appropriate  analytic 
representations.  Then  least  square 
methods  [15]  are  employed  to  fit  these  road 
segments  into  straight  lines  and  circular 
curves  optimally.  Based  on  the  parametric 
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representation  of  road  network,  different 
levels  of  details  can  be  employed  to 
generate  the  road  model, 

2.3  Civil  Engineering  Rule  Based 
Road  Surface  Modeling 

Since  the  road  surface  is  the  most 
significant  feature  of  a road  and  its  quality 
dramatically  affects  our  driving  experience, 
principles  and  rules  on  road  surface  design 
are  of  ultimate  importance.  To  model  this 
most  critical  component  of  roads  realistically, 
namely,  road  surface,  a basic  set  of  civil 
engineering  rules  for  road  design  are 
selected,  including  design  speed,  cross 
slope,  superelevation,  grade,  etc.  The  road 
surfaces  will  be  generated  in  compliance 
with  these  civil  engineering  rules.  Among 
all  of  them,  normal  cross  slope  for  most 
road  surfaces  and  superelevation  for  some 
curved  road  surfaces  are  two  major  factors 
directly  determining  the  main  shape  of  the 
road  surfaces.  Besides  these,  method  for 
pavement  modeling  and  rendering  is  also 
discussed  in  this  section, 

2.3.1  Normal  Cross  Slope 

Sloping  on  roadway  cross  section  is 
employed  to  meet  the  drainage  needs  and 
direct  water  off  the  traveled  way  to  facilitate 
road  users  and  reduce  accident  potential. 
According  to  American  Association  of  State 
Highway  and  Transportation  Officials 
(AASHTO)  standards  [16],  a plane  model 
with  a peak  in  the  middle  and  a 2%  cross 
slope  downward  toward  both  edges  is 
preferred  in  this  research  for  normal  road 
surface  modeling  as  shown  in  Figure  2. 

2.3.2  Superelevation 

The  superelevation  of  a curved  road 
segment  is  used  to  balance  the  centrifugal 
force  that  moves  the  vehicle  traveling  on 
this  curve  outward  with  gravity  and  side 
friction  according  to  the  laws  of  mechanics. 
Based  on  civil  engineering  rules  [16],  the 
superelevation  rate  can  be  determined 
based  on  the  side  friction  factor,  road  radius 
and  vehicle  speed  and  in  this  research,  the 
standard  superelevation  rates  suggested  by 
[16]  are  used  for  different  range  of  curve 


radius,  road  types  and  surface  conditions. 
The  centerline  of  the  roadbed  is  used  as  the 
axis  of  rotation  for  superelevation  and  an 
illustration  of  a superelevation  transition 
from  normal  crown  to  full  superelevation  can 
be  found  in  Figure  2. 


Figure  2 An  illustration  of  a normal  crown  to 
full  superelevation  transition 

2.3.3  Pavement 

Road  surfaces  contain  different  types  of 
pavements  such  as  asphalt  and  concrete. 
Several  methods  can  be  used  to  render  the 
road  surface  (pavements),  which  are  texture 
mapping,  programmable  pixel  shaders,  and 
programmable  vertex  shaders.  Besides 
visual  simulation  of  the  road  surfaces,  the 
geometry  of  the  road  surfaces  can  be 
further  modified  to  reflect  the  variations  on 
the  road  surface  using  programmable  vertex 
shaders  by  adjusting  the  vertex  positions. 
For  instance,  subtle  variations  of  the  road 
surface  can  be  achieved  by  applying  Perlin 
Noise  and  significant  changes  of  the  road 
conditions  can  be  obtained  through  addition 
of  holes  to  the  road  surface.  We  have 
generated  a series  of  mathematical  models 
for  modeling  different  kinds  of  craters  on 
planetary  surface  [17],  These  craters  can 
be  further  modified  to  simulate  the  wear  and 
tear  of  road  surface. 

2.4  Intersection  and  Interchange 

2.4.1  Intersection 

The  method  discussed  in  [18]  for  junction 
synthesis  was  adopted,  expanded  and 
finally  integrated  into  this  research  to 
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(a)  (b) 

Figure  3 3D  exampled  road  network  with  generated  interchange  (overlapped  positions  are 
indicated  by  green  points),  (a)  Top  view,  (b)  Side  view. 


generate  the  road  intersection.  Compared 
with  the  original  method  that  just  considered 
synthesizing  junctions  from  connected 
straight  line  road  segments,  our  expanded 
method  produces  junctions  from  both 
straight  and  curved  road  segments  with  the 
help  of  side  vectors.  The  resulting  road 
intersection  is  represented  by  several 
parameters,  including  the  position  of  the 
intersection  center  (node),  the  end  point  of 
the  centerline  and  two  boundary  points  for 
each  connected  segment.  This  process 
produces  smooth  transitions  between  road 
segments. 

2.4.2  Interchange 

Road  interchanges  are  special  road 
intersections  that  combine  ramps  and  grade 
separations  at  the  junction  of  two  or  more 
highways  in  order  to  reduce  or  eliminate 
traffic  conflicts,  improve  driving  safety  and 
increase  traffic  capacity.  Traffic  interchange 
modeling  is  the  most  critical  and  challenging 
component  of  road  network  modeling  due  to 
its  complexity  and  data  deficiency  that 
existing  road  GIS  data  are  2D  and  do  not 
contain  any  height  information  (vertical 
position).  Hence  in  order  to  generate  3D 
models  of  the  interchanges,  the  overlapped 
positions  in  the  interchanges  are  identified 
firstly  and  appropriate  elevations  are 
assigned  to  road  links  so  that  road  links  do 
not  intersect  or  collide  with  each  other.  In 


detail,  since  in  the  road  GIS  data,  road 
intersections  generally  only  occur  at  the  end 
points  of  road  links  (polylines),  if  two  road 
polylines  contain  the  same  point  that  is  not 
the  endpoint  of  either  polyline  or  both 
polylines,  this  point  is  the  location  where 
these  two  road  polylines  overlap.  And  then, 
after  overlapped  position  identification,  the 
elevations  of  the  overlapped  road  links  are 
estimated  based  on  mathematical 
formulations  combining  with  observation  of 
real  traffic  interchanges.  Next  the  concept  of 
"elevation  level"  is  used  to  roughly 
represent  the  different  height  of  road  links 
and  greater  level  values  correspond  to 
higher  elevations.  Finally  after  determining 
the  elevation  level  for  each  overlapped  road 
point,  absolute  elevations  will  be  calculated 
based  on  the  terrain  elevation  and  level 
height,  and  linear  interpolation  is  used  to 
compute  the  elevations  for  road  points 
located  between  two  overlapped  positions. 

A primary  result  of  the  interchange 
generation  is  shown  in  Figure  3 and 
reasonable  relative  elevation  relationship  is 
got  via  our  proposed  method. 

2.5  Implementation 

The  proposed  methods  are  implemented  on 
the  Microsoft  XNA  platform  [19].  Road 
network  models  were  created  based  on  GIS 
data  using  the  proposed  methods  and 
rendered  with  various  shaders.  At  the  same 
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time,  if  needed,  bridges  and  tunnels  can 
also  be  generated  with  just  some  small 
modifications  of  common  road  segment 
generation  program.  Furthermore, 
considering  about  the  environmental  terrain 
modeling,  digital  terrain  elevation  and 
satellite  image  can  also  be  combined 
seamlessly  in  this  system  to  enhance  the 
final  effects.  Figure  4 shows  the  generated 
3D  model  for  the  exampled  road  network. 

3.0  DISCUSSION 

Although  civil  engineering  principles  are 
emphasized  in  the  proposed  work,  it  is 
worth  mentioning  that  we  do  not  aim  at 
generating  a product  that  can  be  used  for 
real  road  design  and  construction.  Instead 
the  intended  audience  and  users  of  this 
work  are  professionals  in  the  modeling  and 
simulation  industry,  computer  and  video 
game  industry,  and  other  computer  graphics 
applications  that  require  realistic  roadway 
models.  The  purpose  is  to  provide  rapid 
and  efficient  3D  road  modeling  for  such 


applications  that  have  higher  requirements 
on  high-fidelity  road  models,  such  as  racing 
games  and  driving  simulations.  As  such, 
not  all  civil  engineering  rules  on  road  design 
will  be  utilized  in  the  proposed  work.  It  is 
also  important  to  note  that  existing  road  GIS 
data  does  not  provide  complete  information 
that  is  needed  to  generate  3D  road  models 
from  the  GIS  data,  for  example,  without 
elevation  (height)  information,  it  is  difficult  to 
determine  the  exact  vertical  locations  of 
road  network.  In  addition,  some  existing 
roads  actually  do  not  conform  to  the  design 
standards  (especially  some  old  roads  in 
urban  areas).  Considering  all  these  factors, 
although  the  3D  road  models  to  be 
produced  by  the  method  proposed  in  this 
paper  are  still  reasonable  approximations  of 
real  roadways  and  may  not  have  the  exactly 
the  same  structure  of  the  real  existing  ones, 
they  will  have  enough  fidelity  and  resolution 
that  are  required  by  high-end  modeling  and 
simulation  applications. 


(d)  (e)  (f) 

Figure  4 Experimental  results  (a)  3D  road  network  generated  for  the  exampled  road  network,  (b) 
Top  view  of  the  area  indicated  by  the  red  rectangle  in  (a),  (c)  Side  view  of  the  area  indicated  by  the 
yellow  circle  in  (a).  (d)(e)(f)  Different  views  of  the  interchange  part  of  the  generated  3D  road 
network  model. 
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4.0  CONCLUSION 

In  conclusion,  this  paper  worked  on  the 
automatic  generation  of  3D  high-fidelity  road 
network  models  from  real  road  GIS  data. 

The  proposed  method  can  apply  to  the 
modeling  of  the  whole  road  network  that  is 
critical  for  applications  that  have  stringent 
requirements  on  high  fidelity  road  network, 
such  as  driving  and  transportation 
simulation.  Also  with  minor  modification, 
the  proposed  method  can  be  extended  for 
other  areas,  such  as  generation  of  subway 
system  based  on  2D  subway  maps. 
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Introduction 


• Road  is  an  essential  feature  of  civilization. 

• Road  models  are  widely  used. 

- Computer  games 

- Virtual  environment  construction 

• 2D/3D  road  models  were  generated  manually. 

- Creator,  3DMax  and  Maya 

- Massive  labor  & skilled  artists 

• Procedural  modeling 

- Virtual  effects 

- Geometric  or  architectural  fidelity 

9/2/2010  3 
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Introduction 


• GIS:  Geographic  Information  Systems 

- Road  GIS  data 

• 3D  road  network  modeled  from  road  GIS  data 

- Directly  & automatically 

- Reduce  both  time  & labor  cost. 

- Few  existing  methods 

• Ongoing  project 

- Generate  3D  high-fidelity  road  network  models. 

- Existing  road  GIS  data 

- A set  of  selected  civil  engineering  rules 

9/2/2010 
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Outline 


• Introduction 

• Body 

- Road  GIS  Data  Preprocessing 

- Parametric  Representation 

- Civil  Engineering  Rule  Based  Road  Surface  Modeling 

- Intersection  and  Interchange 

- Implementation 

• Discussion 

• Conclusion 


Road  GIS  Data  Preprocessing 


- Road:  polyline  consisting  of  consecu 


discrete  2D  centerline  road  points 
• Road  Network  Topology  Extraction  (Jakkula,  2007) 


9/2/2010 


Road  GIS  Data  Import:  Shapelib 
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""'“  Road  GIS  Data  Preprocessing 


Figure  1.  A road  network  example  which  contains  several  road  segments, 
road  intersections  and  a traffic  interchange,  (a)  Satellite  photo  around  VA-403 
Norfolk,  VA  from  USGS.  (b)  Road  centerline  and  discrete  road  centerline 
points  from  road  GIS  data  shown  in  ArcGIS  for  the  same  area,  (c)  Road 
network  after  topology  extraction  with  extracted  nodes  indicated  by  red  points. 

9/2/2010  7 


WORLD 

“'“'  Road  GIS  Data  Preprocessing 

■ Road  Network  Simplification 

- Same  physical  positions  but  different  names 

■ Road  Classification 

- Highway:"!-"  (e.g.  1-64),  "HWY". 

- Local: " LN", " ST", " RD", " DR",  "AVE",  "BLVD”, 
"PKWY". 

- Ramp:  “RAMP",  0-9+A-Z  (e.g.  14B). 

- Bridge:  "BRIDGE". 

- Tunnel:  "TUNNEL". 

- Unknown:  "NULL“. 
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Parametric  Representation 

• Parameterization:  converts  original  discrete  road 

network  into  parametric  representations. 

• Two  Standard  Forms  (segment  units) 

- Straight  Line  (Line):  A straight  line  connecting  two 
points  (its  ID,  start  point,  end  point,  start  side  vector 
and  end  side  vector). 

- Circular  Curve  (Curve):  Part  of  a circle  (its  ID, 

position  of  center  point,  radius,  start  point,  end  point, 
start  side  vector,  end  side  vector,  start  angle,  end 
angle  and  direction). 
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Parametric  Representation 

* Segmenting:  divides  road  link  into  segments  that  can  be 
represented  in  standard  segment  forms. 

- Three  types  of  critical  points  are  identified. 

- Acute  turn,  s-turn  and  turn  start/end  (Wang,  2009) 
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Civil  Engineering  Rule  Based 
Road  Surface  Modeling 

• Road  Surface 

- The  most  significant  feature  of  a road 

- Its  quality  dramatically  affects  driving  experience. 

• Standard 

- American  Association  of  State  Highway  and 
Transportation  Officials  (AASHTO)  standard 

• A basic  set  of  civil  engineering  rules 

- Normal  Cross  Slope 

- Superelevation 

- Pavement 

9/2/2010  11 
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Normal  Cross  Slope 


• Slope 

- Be  employed  to  meet  the  drainage  needs  and  directs 
water  off  the  traveled  way. 

* A plane  model 

- A peak  in  the  middle  and  a 2%  cross  slope  downward 
toward  both  edges  is  suggested  and  used. 


IVormal  C^rowii 
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Superelevation 


• Be  used  to  balance  the  centrifugal  force  with  gravity 
and  side  friction  for  curve  road  segment 

• Superelevation  rate 

- The  side  friction  factor,  road  radius  and  vehicle  speed 

- Standard  superelevation  rates  are  used  for  different 
range  of  curve  radius,  road  types  and  surface 
conditions. 
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Superelevation 


Figure  2.  An  illustration  of  a normal  crown  to  full  superelevation 

transition. 
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Pavement 


• Different  types  of  pavements:  asphalt  and  concrete 

- Texture  mapping 

- Pixel  shaders  & Vertex  shaders 

• Variations  on  the  road  surface 

- Subtle  variations  : Perlin  Noise 

- Wear  and  tear : crater  modeling  (Wang,  2008) 


9/2/2010 


IS 


nlMODSlM  WORLD 

^ ^ co'.re*ie^cfi  Exoo  _ ^ 

Intersection 

• The  method  in  (Sun,  2004)  for  junction  synthesis 

• Produced  junctions  from  both  straight  and  curved  road 
segments. 

• The  resulting  road  intersection 

- The  position  of  the  intersection  center  (node) 

- The  end  point  of  the  centerline 

- Two  boundary  points  for  each  connected  segment 
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Interchange 


• Special  road  intersection  combining  ramps  and  grade 
separations  at  the  junction  of  two  or  more  highways 

• Reduce  or  eliminate  traffic  conflicts,  improve  driving 
safety  and  increase  traffic  capacity. 

• The  most  critical  and  challenging  component 
- Its  complexity  & data  deficiency 


• Identify  overlapped  positions  in  the  interchange. 

■ Determine  elevation  level  for  each  overlapped  road 
point. 

- Elevation  level 

- Mathematical  formulations 

- Observation  of  real  traffic  interchanges 

- Calculate  absolute  elevation  for  each  road  point. 

- Terrain  elevation  and  level  height 

- Linear  interpolation 


Interchange 
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(a) 


(b) 


Figure  3.  3D  exampled  road  network  with  generated  interchange 
(overlapped  positions  are  indicated  by  green  points),  (a)  Top  view, 
(b)  Side  view. 
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Implementation 

• Microsoft  XNA  platform 

• Rendered  with  various  shaders. 

• Bridges  and  tunnels 

• Digital  terrain  elevation  and  satellite  image 
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(d)  (e)  (f) 


Figure  4.  Experimental  results  (a)  3D  road  network  generated  for  the 
exam  pled  road  network,  (b)  Top  view  of  the  area  indicated  by  the  red 
rectangle  in  (a),  (c)  Side  view  of  the  area  indicated  by  the  yellow  circle  in  (a). 
(d)(e)(f)  Different  views  of  the  interchange  part  of  the  generated  3D  road 
network  model. 
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Outline 


* Introduction 

• Body 

- Road  GIS  Data  Preprocessing 

- Parametric  Representation 

- Civil  Engineering  Rule  Based  Road  Surface  Modeling 

- Intersection  and  Interchange 

- Implementation 

* Discussion 

• Conclusion 
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Discussion 


• Do  not  aim  at  real  road  design  and  construction. 

• Rapid  and  efficient  3D  road  modeling  for  computer 
graphics  applications  requiring  realistic  roadway  models 

- Racing  games  and  driving  simulations 

• Existing  data  do  not  provide  sufficient  information. 

• Existing  roads  do  not  conform  to  the  design  standards. 

• Reasonable  approximations  of  real  roadways 

• May  not  have  the  exactly  the  same  structure. 
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Outline 


• Introduction 

• Body 

- Road  GIS  Data  Preprocessing 

- Parametric  Representation 

- Civil  Engineering  Rule  Based  Road  Surface  Modeling 

- Intersection  and  Interchange 

- Implementation 

• Discussion 

• Conclusion 


9/2/2010 


24 


970 


,^«0DSIM  VyOftlO 

^ ^ co'-.rgr^ce  5.  ^oo 


Conclusion 


• The  automatic  generation  of  3D  high-fidelity  road 
network  models  from  real  road  GIS  data 

• Apply  to  the  modeling  of  the  whole  road  network. 

• Be  extended  for  other  areas. 

- Subway  system 
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Questions/Comments 


© Jie  Wang:  iwanq016@odu.edu 
© Yuzhong  Shen:  vshen@odu.edu 


Thank  You  ! 
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8.0  PAPERS  FROM  MODSIM  WORLD 
2009  (PREVIOUSLY  UNPUBLISHED) 


Category  Learning  Research  in  the  Interactive  Online  Environment  Second  Life 


8.1  Category  Learning  Research  in  the  Interactive  Online  Environment 
Second  Life 

Category  Learning  Research  in  the  Interactive  Online 
Environment  Second  Life 

Jan  Andrews 

VassBr  College  Program  in  Cognitive  Science 
andrewsj@vassar  edu] 

Ken  Livingston 

Vassar  College  Program  in  Cognitive  Science 
livingst@  vassar  edu; 

Joshua  Sturm 

Vassar  College  Program  in  Cognitive  Science 
josturm@vassaredu; 

Daniel  Bliss 

Vassar  College  Program  in  Cognitive  Science 
danieLp.bliss@gmaiicom; 

Daniel  Hawthorne 

Vassar  College  Program  in  Cognitive  Science 
freedijym@gmaiicom; 

Abstract.  The  interactive  online  environment  Second  Life  allows  users  to  create  novel  three-dimensional 
stimuli  that  can  be  manipulated  in  a meaningful  yet  controlled  environment.  These  features  suggest 
Second  Life's  utility  as  a powerful  tool  for  investigating  how  people  learn  concepts  for  unfamiliar  objects. 
The  first  of  fwo  studies  was  designed  to  establish  that  cognitive  processes  elicited  in  this  virtual  world  are 
comparable  to  those  tapped  in  conventional  settings  by  attempting  to  replicate  the  established  finding  that 
category  learning  systematically  influences  perceived  similarity.  From  the  perspective  of  an  avatar, 
participants  navigated  a course  of  unfamiliar  three-dimensional  stimuli  and  were  trained  to  classify  them 
into  two  labeled  categories  based  on  fwo  visual  features.  Participants  then  gave  similarity  ratings  for 
pairs  of  stimuli  and  their  responses  were  compared  to  those  of  control  participants  who  did  not  learn  the 
categories.  Results  indicated  significant  compression,  whereby  objects  classified  together  were  Judged  to 
be  more  similar  by  learning  than  control  participants,  thus  supporting  the  validity  of  using  Second  Life  as 
a laboratory  for  studying  human  cognition.  A second  study  used  Second  Life  to  test  the  novel  hypothesis 
that  effects  of  learning  on  perceived  similarity  do  not  depend  on  the  presence  of  verbal  labels  for 
categories.  We  presented  the  same  stimuli  but  participants  classified  them  by  selecting  between  two 
complex  visual  patterns  designed  to  be  extremely  difficult  to  label  While  learning  was  more  challenging 
in  this  condition,  those  who  did  learn  without  labels  showed  a compression  effect  identical  to  that  found  in 
the  first  study  using  verbal  labels.  Together  these  studies  establish  that  at  least  some  forms  of  human 
learning  in  Second  Life  parallel  learning  in  the  actual  world  and  thus  open  the  door  to  future  studies  that 
will  make  greater  use  of  the  enriched  variety  of  objects  and  interactions  possible  in  simulated 
environmente  compared  to  traditional  experimental  situations. 


1.  Introduction 

The  study  of  how  people  acquire  and  represent 
knowledge  of  category  concepts  is  a broad  area 
of  research  in  cognitive  science  and  psychology 
that  includes  a wide  variety  of  issues  and 
approaches.  The  human  ability  to  group  objects 


into  categories  and  thereby  treat  them  as 
equivalent  for  certain  purposes  is  fundamental  to 
human  cognition,  providing  a foundation  for 
memory,  language,  and  reasoning. 

The  process  by  which  new  category  concepts 
are  acquired  is  also  highly  relevant  to  the  study 
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of  learning  and  therefore  to  the  field  of 
education.  One  method  of  studying  this  process 
is  to  teach  adults  {or  children)  unfamiliar 
categories  and  observe  resulting  changes  in 
perceptual  judgments  of  category  instances. 
Using  stimuli  of  various  kinds,  both  physical 
objects  and  computer-generated  images,  our 
laboratory  and  others  have  demonstrated  that 
certain  effects  generally  occur  (see,  e.g.,  [1], 
[2]):  namely,  objects  classified  together  are 

treated  as  more  alike,  an  effect  we  call 
compression,  and/or  objects  classified  differently 
are  treated  as  more  distinctive,  an  effect  we  call 
expansion. 

The  broad  conception  of  category  learning 
assumed  by  this  approach  (and  the  field  more 
generally)  treats  potential  category  instances  as 
consisting  of  a set  of  values  of  various  features 
or  dimensions.  For  example,  a particular  dog 
would  have  values  on  such  dimensions  as  body 
size,  furriness,  color  of  fur,  length  of  tail,  and  so 
forth.  We  propose  that  compression  and 
expansion  effects  caused  by  learning  a new 
category  distinction  essentially  constitute  a 
change  in  the  way  these  dimensions  are 
represented,  a kind  of  warping  of  the 
psychological  dimensional  space,  such  that  the 
learner  becomes  more  sensitive  to  dimensions 
important  to  the  category  distinction  and  less 
sensitive  to  those  irrelevant  to  the  distinction. 
This  results  in  the  psychological  clustering  of 
items  that  are  classified  together,  allowing 
objects  that  differ  from  each  other  in  category- 
irrelevant  ways  to  cohere  with  one  another  and 
contrast  with  objects  clustered  in  other 
categories. 

In  a typical  category  learning  experiment, 
participants  are  shown  a series  of  items,  usually 
from  a set  of  artificial  stimuli  created  by  the 
experimenters,  and  trained  to  classify  them  into 
two  categories  by  receiving  feedback  on  their 
classification  responses.  Training  is  stopped 
when  classification  accuracy  reaches  a high 
level  or  a certain  number  of  runs  through  the 
stimuli  ("blocks”)  have  occurred.  Participants 
then  judge  a large  number  of  pairs  of  stimuli  and 
either  rate  their  similarity  or  decide  whether  the 
objects  are  identical  or  not,  to  determine  how 
alike  or  confusable  objects  are  within  or  between 
categories,  A control  group  of  participants 
judges  the  same  stimulus  pairs  in  the  same  way 
without  having  received  classification  training. 


To  test  for  compression/expansion  effects,  it  is 
necessary  to  use  unfamiliar  categories  of 
objects,  since  there  can  be  no  "control”  group  for 
categories  that  are  already  known.  In  addition, 
for  the  trained  group,  only  successful  learners' 
data  are  included  because  the  effects  are 
hypothesized  to  arise  from  acquisition  of  the 
category  concepts.  Interestingly,  compression 
and  expansion  appear  to  also  require 
multidimensional  objects  and  do  not  seem  to 
occur  for  objects  consisting  of  a single 
dimension  of  variation  (see  [3]  for  a famous 
counterexample,  and  [4]  for  evidence  against 
this  counterexample). 

In  order  to  ensure  that  the  learning  processes 
tapped  in  this  experimental  paradigm  are 
relevant  to  learning  in  the  real  world,  it  would  be 
very  helpful  to  apply  it  in  a more  flexible, 
dynamic,  and  interactive  environment  than  that 
of  the  standard  research  laboratory  where  static 
stimuli  are  displayed  one  at  a time  on  a flat 
computer  display  and  the  participant  enters 
responses  via  a keyboard  press.  The  research 
reported  in  this  paper  represents  an  effort  to 
recreate  this  category  learning  paradigm  in  the 
interactive  online  3-D  environment  Second  Life. 
While  some  scholars  have  discussed  the 
potential  utility  of  Second  Life  for  scientific 
research  (e.g.,  [5],  [6]),  we  are  not  aware  of  its 
previous  use  for  studying  category  learning. 
The  purpose  of  the  first  study  was  to  determine 
whether  compression/expansion  effects  would 
also  occur  in  a category  learning  task  in  Second 
Life,  in  order  to  establish  continuity  between  the 
cognitive/perceptual  processes  being  tapped  in 
standard  laboratory  tasks  and  in  Second  Life.  If 
successful,  this  study  would  support  the  use  of 
Second  Life  for  testing  new  hypotheses  related 
to  category  learning. 


2,  EXPERIIVIENT  1 

2.1  Method 

2.1.1  Participants  A total  of  44  Vassar 
College  undergraduates  either  volunteered  as 
part  of  an  introductory  psychology  class 
requirement  or  were  paid  for  their  participation. 

2.1.2  Materials  All  objects,  including  stimuli 
used  in  the  categorization  experiment,  were  built 
using  the  Second  Life  (SL)  build  tools  (version 

1.19.1  of  SL).  Categorization  stimuli  were 
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inspired  by  those  used  in  [7].  Each  of  the  32 
different  stimuli  was  constructed  by  first  resizing 
and  then  linking  four  separate  objects,  or 
“prims,”  in  the  language  of  SL,  to  a central 
sphere:  two  rectangular,  and  two  pyramidal  (see 
Figure  1).  The  spherical  center  of  each  object 
was  wrapped  with  the  “horizontal  stripes” 
texture.  The  four  separate  objects  were  then 
attached  to  the  central  sphere  of  each  stimulus, 
with  each  of  these  protrusions  possessing  the 
default  texture.  The  first  protrusion  was  attached 
to  the  top  of  the  sphere  and  consisted  of  an 
inverted  red,  pyramidal  prim,  linked  to  a red 
rectangular  prim.  This  top  two-part  protrusion 
was  set  to  glow  at  an  intensity  of  0.10,  to 
enhance  its  saliency.  The  second  protrusion  was 
a white  pyramidal  prim  attached  to  the  lower, 
left-hand  portion  of  the  central  sphere  and  was 
the  same  for  all  stimuli,  as  was  the  third 
protrusion,  a white  rectangular  prim  attached  to 
the  lower,  right-hand  portion  of  the  central 
sphere. 


Figure  1.  Example  of  a stimulus  used  in 
Experiment  1 


The  stimuli  varied  on  two  dimensions:  the  height 
of  the  top  protrusion  and  the  width  of  the  stripes 
covering  the  central  sphere.  There  were  eight 
values  of  each  dimension.  The  top  protrusion 
varied  in  increments  of  0.5m  between  stimuli 
and  ranged  from  2.5m  to  6.0m.  Stripe  width 
varied  from  0.4  to  13.1  repetitions  per  object.  In 
order  to  make  the  increments  of  this  variation 
perceptually  uniform,  a set  power  function  of 
1.357  was  applied.  This  implies  that  each 
increase  in  stripe  width  between  stimuli 
corresponded  to  2.5  just-noticeable-difference 
units. 

The  32  stimuli  were  subdivided  into  16  Gexes 
and  16  Zofe  (nonsense  labels  selected  for  low 
associability).  Members  of  the  Gex  category 
possessed  the  widest  stripes  and  the  longest  top 


protrusions.  Members  of  the  Zof  category 
possessed  the  narrowest  stripes  and  the 
shortest  top  protrusions.  The  stimuli  were 
displayed  on  a black  rectangular  background. 

A mobile  seat  (the  pink  square  in  Figure  2)  was 
also  created  and  then  programmed  using  the 
Linden  Scripting  Language  (LSL)  to  respond  to 
the  click  of  a participant’s  mouse.  The 
participant  navigated  this  world  of  Gexes  and 
Zofe  as  an  androgynous  avatar. 


Figure  2.  The  participant  avatar  and 
experimental  setting  used  in  Experiment  1 


2.1.3  Procedure  Participants  were  randomly 
assigned  to  the  learning  or  control  condition. 
They  were  tested  individually  and  entered  SL 
using  a Macintosh  computer  running  OSX 
10.4.11.  Positioned  in  front  of  the  screen, 
participants  were  told  that  they  would  first  use 
the  arrow  keys  on  the  keyboard  to  navigate  the 
avatar  through  a short  orientation  path,  in  order 
to  become  familiar  with  the  skills  needed  to 
complete  the  task.  Once  through  the  path,  the 
participants  were  asked  to  click  on  the  virtual 
seat  in  front  of  the  avatar  to  fix  the  avatar’s  view 
in  “mouse  look”  mode  (first-person  perspective). 

In  the  classification  task,  all  32  stimuli  were 
arrayed  vertically  in  a different  random  order 
against  each  of  eight  black  backgrounds,  one 
behind  the  other  (see  Figure  2).  The  same 
random  orders  were  used  across  participants. 
Participants  viewed  the  stimuli  one-by-one  while 
moving  upward  on  the  seat,  moving  past  the 
objects  displayed  against  the  black  background. 
The  seat  was  programmed  to  stop  at  each 
stimulus  location.  After  each  stimulus  was 
shown,  participants  indicated  which  category 
they  thought  the  object  belonged  to  by  clicking 
either  the  “Gex”  or  the  “Zof  button  that 
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appeared  in  the  upper  right-hand  corner  of  the 
screen.  Imnnediately  following  the  button  press 
they  received  auditory  feedback  concerning  the 
correctness  of  the  response.  Arrows  directed 
participants  to  each  successive  set  of  stimuli 
(with  a new  black  background  and  seat). 
Participants  completed  all  eight  blocks  unless 
they  achieved  a total  of  at  most  one  incorrect 
response  in  two  consecutive  blocks,  at  which 
point  training  was  stopped  immediately  . 

After  the  classification  task,  participants  rode  on 
a seat  to  the  similarity  task  area.  There  they 
viewed  the  32  stimuli  in  pairs,  seeing  the  objects 
in  a given  pair  one  at  a time  for  3 seconds  each. 
Participants  rated  the  similarity  between 
members  of  each  pair  on  a scale  from  1 to  9 {1 
being  least  similar  and  9 being  most  similar)  by 
clicking  one  of  nine  buttons  that  appeared  in  the 
upper,  right-hand  corner  of  the  screen.  A total  of 
90  pairs  of  stimuli  (30  Gex-Gex,  30  Zof-Zof,  30 
Gex-Zot)  were  rated,  and  the  pairs  were 
presented  in  the  same  random  order  across 
participants.  The  90  pairs  were  split  into  six 
blocks  of  15  pairs;  as  above,  blocks  were 
separated  onto  different  black  backgrounds,  and 
participants  followed  arrows  to  each  successive 
block. 

Control  condition  participants  completed  only  the 
similarity  Judgment  task,  without  any  prior 
category  learning. 

2,2  Results 

Four  of  the  24  participants  in  the  learning 
condition  failed  to  pass  the  learning  criterion  - 
they  did  not  complete  two  consecutive  blocks 
with  a total  of  at  most  one  incorrect  response  in 
the  classification  task  - so  their  data  were  not 
included  in  the  analysis. 

A 2 (group:  learning  vs.  control)  by  2 (pair-type: 
members  within  the  same  category  vs.  members 
from  separate  categories)  analysis  of  variance 
with  repeated  measures  on  the  second  factor 
was  performed  on  the  similarity  ratings.  This 
yielded  a significant  main  effect  of  pair-type 
(F{1,38)  = 501,724,  p < .001)  and  a significant 
interaction  between  group  and  pair-type  (F(1,38) 
= 7.847,  p = .008).  As  shown  in  Figure  3,  within- 
category  pairs  were  judged  to  be  more  similar 
than  be  tween-category  pairs  by  both  groups,  but 
the  learning  group  judged  within-category  pairs 
to  be  more  similar  than  did  the  control  group  and 
the  be  tween-category  pairs  to  be  less  similar 


than  did  the  control  group,  corresponding  to 
compression  and  expansion,  respectively. 
However,  planned  one-tailed  f-tests  revealed 
that  only  the  compression  effect  was  significant 
(f(38)=  1.748,  p<  .05). 


Within  Between 

Pair  Type 

Figure  3.  Results  of  Experiment  1 


2.3  Discussion 

Experiment  Ts  instantiation  of  the  category 
learning  paradigm  described  in  the  Introduction 
within  the  environment  of  Second  Life  produced 
a significant  compression  effect  of  the  sort 
typically  found  in  standard  laboratory  versions, 
suggesting  that  the  category  learning  processes 
tapped  are  substantially  the  same.  While  our 
ultimate  goal  is  to  develop  more  innovative  uses 
of  Second  Life  for  research  on  category 
learning,  we  next  took  advantage  of  some  of  the 
more  straightforward  features  of  the  laboratory 
we  had  already  constructed  there  to  explore  a 
long  standing  question  about  the  role  of  verbal 
labels  in  category  learning. 

Normally  when  we  learn  new  categories  we 
simultaneously  learn  words  for  those  categories. 
In  fact,  using  a single  word  to  refer  to  a set  of 
objects  that  vary  on  several  dimensions  has 
been  hypothesized  to  be  central  to  the  category 
learning  process  in  humans  (e.g.,  [8]).  The 
single  label  provides  an  explicit  feature  common 
to  all  category  instances  and  may  support  the 
learning  of  categories  in  important  ways.  To 
rigorously  test  this  claim  requires  the  study  of 
category  learning  in  the  absence  of  verbal 
labels,  and  that  poses  a methodological 
challenge.  We  made  creative  use  of  Second 
Life's  unique  stimulus-building  tools  to  meet  this 
challenge. 


976 


In  Experiment  2 we  designed  a novel  response 
system  to  allow  participants  to  learn  the  same 
categories  of  objects  used  in  Experiment  1 , but 
without  any  verbal  labels.  This  allowed  us  to 
test  two  interesting  questions;  (1)  Will  category 
learning  be  more  difficult  under  these  conditions 
compared  to  those  of  Experiment  1?  And,  even 
if  this  is  the  case,  (2)  will  those  participants  who 
do  successfully  learn  the  category  distinction 
exhibit  compression  effects  similar  to  those 
found  in  Experiment  1?  That  is,  will  the 
underlying  mechanism  of  category  learning  be 
the  same  in  the  absence  of  labels? 

3.  EXPERIMENT  2 

3.1  Method 

3.1.1  Participants  A total  of  39  Vassar 
College  undergraduates  either  volunteered  as 
part  of  an  introductory  psychology  class 
requirement  or  were  paid  for  their  participation. 

3.1.2  Materials  The  stimuli  and  categories 
were  identical  to  those  used  in  Experiment  1. 
However,  two  visual  patterns  were  used  in  place 
of  the  verbal  labels  “Gex”  and  “Zof."  These 
patterns  were  colorful  designs  similar  to  tie-dyed 
fabric  and  slightly  different  from  each  other  (see 
Figure  4).  They  were  chosen  because  they  are 
extremely  difficult  to  describe  in  any  simple  way 
and  thus  do  not  easily  lend  themselves  to 
description  by  a single  label. 


Figure  4.  Nonverbal  response  buttons  used  in 
Experiment  2 


3.1.3  Procedure  The  procedure  was  identical 
to  that  used  in  Experiment  1 except  that  the 
labels  “Gex”  and  “ZoF  were  excluded  from  the 
classification  judgments  and  feedback. 
Participants  pressed  one  of  the  two  buttons 
shown  in  Figure  4 to  indicate  which  category 
they  thought  a stimulus  belonged  to  and  were 


simply  told  whether  they  were  correct.  The  right- 
left  positions  of  the  two  response  buttons 
relative  to  each  other  were  randomly  varied  to 
prevent  participants  from  using  the  labels  “right” 
and  “left”  for  the  two  patterns. 

3.2  Results 

Nineteen  of  the  39  participants  failed  to  meet  the 
learning  criterion  used  in  Experiment  1 , and  their 
data  were  not  included  in  analyses.  A chi- 
square  test  of  independence  showed  that 
learning  success  was  related  to  the  presence 
(Experiment  1)  or  absence  (Experiment  2)  of 
verbal  labels  (x^(1 , N = 63)  = 6.58,  p = .01). 

In  order  to  determine  whether  compression  or 
expansion  occurred,  the  data  from  this 
experiment  were  combined  with  the  control 
group  data  from  Experiment  1 and  analyzed  in 
the  same  way.  The  resulting  2 (group: 
nonverbal  vs.  control)  by  2 (pair-type:  within- 
category  vs.  between-category)  analysis  of 
variance  with  repeated  measures  on  the  second 
factor  yielded  a significant  main  effect  of  pair- 
type  (F(1,38)  = 682.612,  p < .001)  and  a 
significant  interaction  between  group  and  pair- 
type  (F(1,38)  = 8.852,  p = .005).  As  shown  in 
Figure  5,  the  pattern  of  the  means  is  identical  to 
that  of  Experiment  1 and,  once  again,  planned 
one-tailed  t-tests  revealed  that  only  the 
compression  effect  was  statistically  significant 
(f(38)  = 1 .850,  p < .05). 

Additional  f-tests  showed  that  the  mean 
similarity  ratings  for  the  learning  groups  in 
Experiment  1 (with  verbal  labels)  and 
Experiment  2 (without  verbal  labels)  did  not 
differ  significantly  for  either  the  within-category 
pairs  or  the  between-category  pairs. 

3.3  Discussion 

The  use  of  complex  visual  patterns  in  place  of 
verbal  labels  in  Experiment  2 made  the  category 
learning  task,  otherwise  identical  to  that  used  in 
Experiment  1,  much  more  difficult.  This  is 
consistent  with  results  reported  by  Lupyan, 
Rakison,  and  McClelland  [8]  using  a similar  task. 
As  they  note,  while  it  is  impossible  to  be  certain 
that  participants  were  not  surreptitiously  using 
invented  verbal  labels,  the  fact  that  learning  was 
significantly  more  difficult  using  nonverbal 
category  responses  suggests  that  explicit  verbal 
labels  do  facilitate  category  learning.  It  is 
interesting  that  half  of  our  participants  in  the 
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nonverbal  condition  were  nonetheless  able  to 
learn  the  categories  to  a very  high  level  of 
accuracy. 


Figure  5.  Results  of  Experinnent  2 


However,  our  main  interest  was  in  determining 
whether,  for  those  participants  who  were  able  to 
master  the  category  distinction,  the  compression 
result  obtained  in  Experiment  1 would  still  occur. 
In  fact,  the  compression  effect  for  successful 
learners  was  virtually  identical  in  the  two 
experiments,  suggesting  strongly  that  while 
verbal  labels  may  make  category  learning 
easier,  they  are  completely  unrelated  to  the 
compresston/expansion  processes  associated 
with  category  learning.  This  is  consistent  with 
the  idea  that  these  processes  are  indeed 
fundamental  to  the  formation  of  new  category 
concepts. 

4.  CONCLUSION 

These  experiments  on  category  learning  in 
Second  Life  demonstrate  both  its  continuity  with 
standard  laboratory  research  and  its  utility  for 
testing  new  hypotheses.  Our  evidence  that 
Second  Life  taps  the  same  cognitive  processes 
observed  in  laboratory  research  supports  its 
validity  for  further  research  on  learning  and 
cognition. 

Our  next  study,  currently  underway,  makes 
much  greater  use  of  Second  Life's  potential  for 
creating  dynamic,  interactive  stimulus  features 
and  engaging,  goal-oriented  tasks.  We  expect 
this  will  allow  us  to  explore  category  learning 
processes  in  ways  that  are  not  possible  in  a 


standard  laboratory  situation  but  that  are 
actually  significantly  more  like  the  real  world. 
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Appendix  B — MODSIM  World  2010  Conference  & Expo  Track  Chairs  and 

Track  Descriptions 


Defense  Track 

Chair,  Steve  Husak,  Steve  Husak  & Assoeiates 

Deputy  Chair,  Thomas  J.  Sides,  Global  Maritime  Seeurity  Systems,  Combat  Direetions  Systems 
Aetivity,  Dam  Neck 

The  Hampton  Roads,  Virginia,  area  is  home  for  many  Department  of  Defense  major  commands. 
It  is  also  home  to  the  U.S.  Joint  Forces  Command  and  NATO’s  Allied  Command 
Transformation.  Together,  these  organizations  create  a unique  synergy  that  makes  the  Hampton 
Roads  region  a focal  point  for  the  development  and  use  of  modeling  and  simulation  (M&S)  to 
meet  complex  training  requirements.  The  Defense  Track  provides  academic,  government  and 
industry  participants  the  opportunity  to  address  M&S  technologies  relative  to  their  ability  to 
resolve  or  mitigate  current  and  projected  operational  challenges.  The  Defense  Track  will 
specifically  focus  on  the  following  five  key  aspects  of  M&S:  (1)  currently  available  products  and 
training  tools;  (2)  new  and  emerging  technologies  and  related  tools;  (3)  best  practices  and  case 
studies  of  successful  applications  of  M&S;  (4)  the  use  of  M&S  for  decision-making,  support  and 
risk  assessment;  and  (5)  discussions  and  predictions  about  the  future  of  M&S  for  use  in  defense. 
Other  aspects  of  interest  include  the  development  and  advancement  of  M&S  for:  defense-related 
training,  assessment  and  evaluation;  technology  integration;  protection  of  critical  infrastructure 
and  infrastructure  resiliency;  disaster  (contingency)  planning,  implementation,  response  and 
recovery;  interoperability;  standards;  and  the  development  of  policy  and  strategies  for  managing 
global,  national  and  regional  events. 


Engineering  & Science  Track 

Co-Chair,  C.  Matthew  O’Connor,  Combat  Directions  Systems  Activity,  Dam  Neck 
Co-Chair,  Dr.  Daniel  P.  Schrage,  School  of  Aerospace  Engineering,  Georgia  Tech 
Deputy  Chair,  Kevin  Stenstrom,  Raytheon 

Modeling  and  simulation-based  engineering  and  science  is  rapidly  becoming  an  essential 
scientific  methodology  for  research  (theoretical  and  experimental)  and  development;  concept 
generation  and  consumer  marketing;  product  design  and  manufacturing;  systems  development 
and  integration;  project  management;  systems  engineering;  logistics;  new  methods  for 
verification,  validation  and  uncertainty  quantification;  and  integration  and  interoperability. 
Continuing  advances  in  computational  science  and  networking  technologies  have  made  modeling 
and  simulation  (M&S)  a powerful  and  ubiquitous  tool  for  engineers  and  scientists,  and  have 
made  it  possible  to  vastly  extend  the  range,  depth  and  applications  of  modeling  and  simulation — 
especially  when  the  phenomena  being  investigated  are  not  observable  or  measurements  are 
impractical  or  too  expensive. 
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The  Engineering  & Seienee  Track  provides  a forum  for  individuals  to  meet,  exchange  and  debate 
ideas,  network,  foster  future  research  opportunities,  and  learn  from  national  and  international 
experts  on  a wide  variety  of  engineering  and  science  topics  and  issues.  By  exploring  the 
challenges,  applications,  technologies  and  future  directions,  this  track  will  challenge  traditional 
perspectives,  cultivate  new  ideas,  and  foster  dialogue  on  how  M&S  can  address  the  present  and 
future  challenges  facing  engineering  and  science. 


Health  & Medicine  Track 

Chair,  Jennifer  Arnold,  Booz  | Allen  | Hamilton 
Deputy  Chair,  Christine  Shamloo,  Booz  | Allen  | Hamilton 
Deputy  Chair,  Rachel  Spencer,  SiTEL  MedStar  Health 
Deputy  Chair,  Brian  Levine,  SAIC 

The  scale  and  complexity  of  the  health  care  sector  is  almost  as  daunting  as  the  unprecedented 
levels  of  change  this  sector  is  facing,  ft  is  becoming  increasingly  clear  that  modeling  and 
simulation  (M&S)  can  be  instrumental  in  addressing  the  multi-faceted  challenges  health  care  is 
facing.  The  Health  and  Medicine  Track  recognizes  that  M&S  tools,  techniques  and  standards  are 
playing  an  increasingly  important  role  in  improving  patient  safety.  M&S  is  also  changing  the 
way  medicine  is  taught  and  practiced,  how  health  care  professionals  are  trained,  the  development 
of  solutions  to  health  problems,  how  we  respond  to  public  health  issues,  and  the  conduct  of  basic 
and  applied  medical  research  in  a variety  of  areas  including  genetics,  neuroscience  and 
population  dynamics.  Fundamental  to  the  application  of  M&S  is  the  underlying  assumption  that 
insight  into  the  behavior  of  a system  can  be  developed  or  enhanced  from  a model  that  adequately 
represents  a selected  subset  of  the  health  care  system’s  attributes.  This  track  provides  a forum  for 
individuals  to  meet,  exchange  and  debate  ideas,  network,  foster  future  research  opportunities, 
and  learn  from  national  and  international  experts  on  a wide  variety  of  health  and  medicine  topics 
and  disciplines.  By  exploring  applications,  technologies  and  future  directions,  this  track  will 
challenge  traditional  perspectives,  cultivate  new  ideas,  and  foster  dialogue  on  how  M&S  can 
address  the  present  and  future  challenges  health  and  medicine  face. 


Homeland  Security  & First  Responders  Track 

Chair,  Bruce  Milligan,  Booz  | Allen  | Hamilton 
Deputy  Chair,  Jay  Allen,  OUSD  (P&R),  ADL  Initiative 

Deputy  Chair,  Tammy  Van  Dame,  Combat  Directions  Systems  Activity,  Dam  Neck 

One  key  factor  that  both  first  responders  and  those  involved  with  homeland  security  have  in 
common  is  that  the  level  of  their  professional  training  and  their  ability  to  quickly  and  effectively 
respond  to  threats  to  our  national  security,  natural  disasters  and  other  catastrophic  events,  is 
something  that  could  have  a potential  impact  upon  any  citizen  of  this  nation.  Therefore,  effective 
training  using  the  most  robust  and  best-developed  methods  and  technologies  is  a critical 
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component  to  the  personal  safety  of  eaeh  and  every  American.  The  Homeland  Security  & First 
Responder  Track  focuses  on  eomputer  models,  simulations  and  serious  games  that  relate  to 
homeland  security.  Many  such  tools  are  eurrently  in  use,  or  are  in  development  by,  groups 
ranging  from  emergency  preparedness  personnel  (ineluding  police,  firefighters,  public  works. 
National  Guard  and  others),  to  those  involved  in  domestic  counter-terrorism  or 
counterintelligence,  border  security,  maritime  and  aviation  security,  and  many  similar  realms. 
Conference  speakers  will  range  from  well-known  authors  and  game  designers  to  those  who  have 
been  in  the  front  lines  of  disaster  response,  law  enforeement  and  homeland  seeurity. 


Human  Dimension  Track 

Chair,  Dr.  Kara  Latorella,  Crew  Systems  & Aviation  Operations  Braneh,  NASA  Langley 
Deputy  Chair,  Phil  Jones,  MYMIC,  LLC 

Modeling  Human  Behaviors  & Interaetions:  Understanding  the  human  dimension  cuts  a broad 
swath;  from  individual  performance  and  thought  to  the  behaviors  of  humans  in  eomplex 
human/maehine  systems,  cultures  and  personal  interaetions.  This  track  focuses  on  modeling  and 
simulation  as  it  relates  to  individual  human  behavior  (information-seeking  and  monitoring, 
decision-making  and  controlling)  and  the  influence  of  effeet  and  environmental  conditions  on 
behavior,  as  well  as  the  interaction  of  individuals-  especially  as  participants  in  complex 
human/maehine  systems,  and  human  social  networks  and  organizations.  This  traek  will  address 
the  utility  of  models  and  simulations  in  terms  of  their  applieation  to  solving  design  and  analysis 
problems  sueh  as:  eharacterizing  individual  differences;  predicting  human  performance  and 
errors;  designing  and  evaluating  human/system  integration  and  human/automation  interaetion; 
team  construetion  and  performance;  and  predieting  communication  dynamics,  or  other  emergent 
soeial  behaviors.  Innovative  approaches  to  requirement  definition,  validation  and  verification, 
and  data  collection  and  analysis  will  also  be  presented. 


K-20  STEM  Education  Track 

Chair,  Mark  Clemente,  Edueator-in-Residence,  National  Institute  of  Aerospace/ Virginia  Beaeh 
City  Publie  Sehools 

Deputy  Chair,  Dr.  Vineent  Charles  Betro,  STEM  Outreaeh  Coordinator,  University  of 

Tennessee,  SimCenter  at  Chattanooga:  National  Center  for  Computational  Engineering 

Accepting  the  Challenge:  The  K-20  Scienee,  Teehnology,  Engineering  and  Math  (STEM) 
Edueation  Traek  will  be  a forum  organized  to  draw  national  attention  to  the  need  to  edueate  and 
train  the  next  generation  of  engineers  and  seientists  in  the  theory  and  praetiee  of  modeling  and 
simulation-based  (M&S)  engineering  and  seience  and  the  myriad  of  associated  technologies. 
Maintaining  our  eompetitive  advantage  and  full  utilization  of  M&S-based  engineering  and 
scienee  requires  fundamental  ehanges  in  America’s  K-20  education  system — especially  in  the 
teaching/leaming  of  STEM — and  the  integration  of  M&S  into  the  K-20  currieulum,  as  both 
content  and  as  an  instructional  strategy.  The  forum’s  purpose  is  to  create  a dialogue  that  will  lead 
to  the  development  of:  (1)  a national  plan  for  integrating  M&S-based  engineering  and  seienee 
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into  K-20  education;  (2)  a literacy  framework  for  M&S;  and  (3)  a research  agenda.  This  forum 
brings  together  nationally  known  speakers  and  panelists  from  academia,  business,  education, 
government  and  industry,  and  will  include  some  of  the  nation’s  foremost  thinkers  and 
practitioners.  This  forum  is  designed  for  K-20  STEM  educators  and  administrators;  legislators; 
state  and  local  school  board  members  and  superintendents;  college  faculty  and  administrators; 
engineers  and  scientists;  state  and  federal  policy  makers;  business,  corporate  and  civic  leaders; 
and  members  of  workforce  boards,  consortia,  partnerships  and  alliances. 


Serious  Games  & Virtual  Worlds  Track 

Chair,  Dr.  Benjamin  Bell,  CHI  Systems,  Inc 

Deputy  Chair,  Dr.  Winston  "Wink"  Bennett,  Warfighter  Readiness  Research  Division, 

Human  Effectiveness  Directorate,  Air  Force  Research  Laboratory 

Committee  Members:  Paul  Cummings,  ICF  International;  Dr.  Jerzy  Jarmasz,  Defence 

Research  & Development  Canada;  Dr.  Stephanie  Lackey,  University  of  Central  Florida 
Institute  for  Simulation  & Training;  Dr.  Sac  Schatz,  University  of  Central  Florida  Institute 
for  Simulation  & Training;  and  Lt.  Joel  Walker,  USAF,  Air  Force  Research  Laboratory 

Rapid  advances  in  computer  hardware  and  software  continue  to  transform  technologies  and  blur 
the  distinctions  between  simulations  and  games.  Paralleling  this  trend,  aggressive  progress  in 
bandwidth,  compression  and  animation  technologies  have  moved  virtual  worlds  from  the 
shadows  of  technophilic  oddity  into  the  bright  light  of  productive  work.  This  cross-cutting  track 
focuses  on  the  related  but  distinctive  areas  of  serious  games  and  virtual  worlds — disciplines  that 
share  a focus  on  simulation  and  interaction,  but  possess  distinctive  approaches,  technologies  and 
cultures.  The  Serious  Games  & Virtual  Worlds  Track  will  explore  all  dimensions  of  serious 
games  and  virtual  worlds,  including  those  relating  to  the  broader  MODSIM  World  2010  tracks: 
Defense,  Engineering  & Science,  Health  & Medicine,  Homeland  Security  & First  Responders, 
The  Human  Dimension,  and  K-20  STEM  Education.  By  exploring  the  applications,  technologies 
and  future  directions  of  serious  games  and  virtual  worlds,  this  track  intends  to  challenge 
traditional  perspectives,  cultivate  new  ideas,  and  foster  dialogue  focusing  on  how  these 
capabilities  can  extend  the  reach  of  modeling  and  simulation. 
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